-
When crawling using Heritrix, if both `sendIfModifiedSince` and `writeRevisitForNotModified` are set to `true` (although the latter has been deprecated, presumably equivalent to always being `true`), …
-
Maybe this is a function of my Mac security settings, but every time a tweet is archived, I get the attached security pop-up.
It may be that all we need to do to address this is add some documentat…
-
If these cards are intended to be used in other sites, it might be a good idea to utilize HTML CustomElements for bi-directional style isolation. We are using it in [Reconstructive Banner](https://git…
-
* How do I pull an entire website with this
* How do I see what it is doing internally?
-
Should we allow the launch requests to be store in a separate topic/log/stream to the URIs the log of discovered URLs?
To make things faster, when running a single crawler, we would directly enqueu…
-
webarchive-commons uses GPL v2 code in at least two places.
[OpenJDK7GZIPInputStream](https://github.com/iipc/webarchive-commons/blob/24846d0d8870e8c6f4d35901a83cda593544dc97/src/main/java/org/archiv…
-
Building on the experience with the Redis-based frontier, it should be possible to build a frontier based on [url-frontier](https://github.com/crawler-commons/url-frontier). The rough outline of the a…
-
## Expected behavior
We've archived this page in the past: http://www.jisc.ac.uk/whatwedo/programmes/programme_preservation/2008sigprops.aspx
The 2008 copy works fine, but it's been replaced wit…
-
Hello,
I would like to know if it possible to get both warc files compressed (not only the metadata one)
Thanks
nasry updated
6 years ago
-
It is currently available and causes no UI change.