-
it seems version 0.2013.2.19 wont work correctly with windows. with a little bit playing heritrix would create a warc file. but wayback find nothing. is a new release planed?
-
Error encountered:
"WAIL" can't be opened because it is from an unidentified developer. Your security preferences allow installation of only apps from the Mac App Store and identified developers.
Thi…
-
Hello,
I am running NAS in distributed environment, with the distribution5.2.2!! i configured my JAVA_HOME with jdk1.8.0_71 and my JMS broker!!
but when i run a new job i always get this error:
dk.…
-
The current status of Wayback's canonicalization is documented [here](http://iipc.github.io/openwayback/resource_index.html#Current_Status_within_Wayback).
This should be amended to be similar to tha…
-
It would be possible to use regex to try to find anchors, CSS, and JS, but this could end up being very messy. I'd suggest using an HTML-parsing library but, since Python is super new to me, I don't k…
nwtn updated
11 years ago
-
Heritrix should support failing URL refetch and automatic retry later if HTTP status code is user configured one (For Example CloudFlare uses special codes like 529 when content is fetched from a site…
-
These are some notes I made on issues raised during the workshop at the IIPC conference:
- The title field too short - 50 chars is not enough.
- 'URLs prefix" scope is wrong/misleading, because th…
-
While porting for #1, this happened:
> One issue I noticed was that the archive-access code brings in entire heritrix-commons just for one class, which appears to be quite general purpose:
>
> im…
-
Heritrix has some [nice URL scraping routines](https://github.com/internetarchive/heritrix3/tree/master/modules/src/main/java/org/archive/modules/extractor) that may be useful.
An option like `--extr…
-
This has been requested a few times but there is currently no way to do this in the WAIL interface, most recently by Beaudry Allen, Digital Archivist at Villanova.
Q: What needs to be included in a…