-
In the past, WAIL has checked for a specific version of Java (6/7?) and if it was not installed, did so. This was due to a requirement by OWB/Heritrix (I believe the latter was the culprit) that did n…
-
...requiring the user to enter them. This process should be transparent. More investigation should be done to see if this is just in Safari or is present in other OS's.
Encountered in Safari (the def…
-
The current status of Wayback's canonicalization is documented [here](http://iipc.github.io/openwayback/resource_index.html#Current_Status_within_Wayback).
This should be amended to be similar to tha…
-
-
I've trying to crawl a HTTPS site through a Squid proxy and keep seeing errors like these:
```
java.io.IOException: RIS already open for ToeThread #12: https://XXX/robots.txt
at org.archive.io.Rec…
-
I've did a simple harvest on two domains:
- trybunal.gov.pl
- www.gov.pl
Heritrix has hanged on "URLs | 370 downloaded + 423 queued = 794 total" and it is not going forward.
Viewerproxy says:…
-
Heritrix should support failing URL refetch and automatic retry later if HTTP status code is user configured one (For Example CloudFlare uses special codes like 529 when content is fetched from a site…
-
-
Following [this report](https://groups.yahoo.com/neo/groups/archive-crawler/conversations/topics/8952;_ylc=X3oDMTM0b2wwNG0wBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BG1zZ0lkAzg5NTQEc…
-
By Laura Waldoch, CUL:
We’ve noticed a data error under the Theme “Cambridge Network”, the following record:
Accelrys
Archived date: 2012-11-21 http://www.accelrys.com/
Clickin…