-
@unclecode I see that setting hook is not working as expected. I am setting a delay using code below :
```
def delay(driver):
print("Delaying for 5 seconds...")
time.sleep(5)
…
-
Hi,
how difficult would it be to allow iron_worker uploading of actual files if they are soft-linked ?
for example I have a repository with 2 folders: 'web' and 'crawler'
the crawler app is just a r…
fred updated
10 years ago
-
I'd suggest to implement some functionality to make the web crawler respect index / disallow settings as defined in the robots.txt or robots meta tags of the website that is being crawled.
See http…
-
Hi,
to me it looks like '--blockRules' blocks entire pages, when a subelement like an iframe-content's URL is matching a passed regex.
Is that correct?
Or what is the exact mechanism?
And if my …
-
As I worked with a scalable web crawler (apache nutch), to scan my serverlist and outgoing links, I noticed that you didn't forbid crawler to scan the avatars.
I would suggest that you do so in robot…
-
Prototype https://github.com/farkaskid/WebCrawler
-
Web crawler to create a newsletter from these blogs
-
-
Recent versions of ArchivesSpace have made it increasingly difficult to effectively control how web crawlers index an ArchivesSpace site (e.g. by providing search functionality at numerous endpoints a…
-
A good example of the HTTP client would be as a web crawler. This could also be a good demonstration of the flexibility of the URI class.