-
e.g., For Heritrix fetch https://github.com/internetarchive/heritrix3/archive/master.zip using a standard GET request or, if you want to be fancy, utilize the dulwich module ( https://github.com/jelme…
-
Just noticed an oddity in our crawls. We have a WARC response with no response in it (see below). This seems to be due to the crawler getting a `HTTP 204` response.
However, I only think that becau…
-
Using `externally_connectable` ( https://developer.chrome.com/extensions/manifest/externally_connectable ), an archive's webUI (or any arbitrary web page, even one on localhost) could be added to the …
-
By Laura Waldoch, CUL:
We’ve noticed a data error under the Theme “Cambridge Network”, the following record:
Accelrys
Archived date: 2012-11-21 http://www.accelrys.com/
Clickin…
-
```
1. Configure the plugin as per the Wiki, amending the config. file with the
correct setting for RabbitMQ.
2. Submit a job using the new plugin.
The job starts but errors shortly thereafter.
Her…
-
As suggested by @ibnesayeed in #141
-
Hi,
I've observed in the code that the value "${launchId}" is expected to be replaced with a value I'm not sure what is. Anyway, I'm trying to understand the configuration file and I found that the…
-
Following [this report](https://groups.yahoo.com/neo/groups/archive-crawler/conversations/topics/8952;_ylc=X3oDMTM0b2wwNG0wBF9TAzk3MzU5NzE0BGdycElkAzg3NTk4NjcEZ3Jwc3BJZAMxNzA1MDA0OTI0BG1zZ0lkAzg5NTQEc…
-
Hello,
I would like to know is it possible to make NAS generate warc files in the standard format ?
Actually we have our own archive platform and it works only with warc/arc files in standard format…
nasry updated
6 years ago
-
Both on a full app level and a component (e.g., Heritrix, Wayback) level.