-
The following refers to a page not found:
http://netpreserve.org/warc/1.0/revisit/identical-payload-digest
http://netpreserve.org/warc/1.0/revisit/server-not-modified
I am not pretty sure whether the…
-
Using the [CDXGenerator](https://github.com/internetarchive/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/CDXGenerator.java) I generated CDX files out of WARCs. Unfortunately, the …
-
Wayback API had matchType option, example:
https://web.archive.org/cdx/search/cdx?url=https://twitter.com/jack/statuses&matchType=prefix
Which returns:
```
com,twitter)/jack/statuses/"/antarni…
-
Missing images both from the page itself as well as the reconstructive logo. WARC created with local webrecorder--built, run, and recorded using Docker and the webrecorder web interface: [temp-2018082…
-
When the access rights for a Crawl Object are changed in Argo we would like those changes to be respected by pywb so that the content is World, Stanford only or Dark (unavailable). In #10 we address t…
-
Large files take a *very* long time to index. I'm toying with the idea of using warc files instead of ZIM files for a kiwix-like offline wikipedia, I can generate a full dump from a wiki xml backup in…
-
The WACZ uploaded should use [ipfs-composite-files](https://github.com/webrecorder/ipfs-composite-files) to add the WACZ file:
- The WARC file in the WACZ should be split along WARC record boundaries
…
-
Around 500 warcs in a single job, uploaded to a single bucket, we often see 403 errors uploading subsequent warcs. Have a selection of alternate bucket names to solve this, probably. It would be goo…
-
It is expected that the JSON-output from Twitter's API will at some point be harvested. In order to present this to users in a usable manner, we need to have a special render for that. In time we migh…
tokee updated
6 years ago
-
It seems that there is a desire to record provenance of WARC files, e.g. in the case of concatenation. See http://ws-dl.blogspot.co.uk/2014/09/2014-09-02-warcmerge-merging-multiple.html
That proposal…