-
If the WARC file reaches its maximum size (which is set to 1GB at the moment), WFile_storeRecord returns an error. wget-warc should open a new WARC file if this happens.
alard updated
13 years ago
-
I think this means defining a new [Cascading Scheme](http://www.cascading.org/javadoc/cascading/scheme/Scheme.html) that knows about the [WARC file format](http://cloud.github.com/downloads/bixo/bixo/…
-
It looks like there's a new implementation developed:
http://monkeyspider.sourceforge.net/
It's essentially using Heritrix (Internet Archive engine) to virus scan archived pages that were crawled us…