-
youzim.it run of https://archives.nyphil.org/ failed reporting lots of unrecognized chars.
Task is [here](https://farm.youzim.it/pipeline/3cd41b6b-2d81-4acb-8948-a6820c5fa07f).
Command used:
``…
-
I'm getting 'permanently moved to here' results when loading up a WARC I made. The word 'here' is a link, and when I click it, it just reloads the same 'permanently moved' page. This would make sense …
-
When getting the warc file from one tab, when I edit it I see that also other headers from other opened tabs I have are inserted. Perhaps the filtering inside `code.js` for `chrome.webRequest.onBefore…
-
When reading a warc file that contains 'Set-Cookie' header and there are multiple cookies present on subsequent lines, the parsing logic breaks the line on the first colon, which appears to be fine fo…
-
Exporting a WARC that takes up hundreds of gigabytes is unfeasible: Tool support is dubious and the risk of an aborted transfer due to timeouts is real.
As the export size of the individual parts o…
-
**Memento Profile Summarization**
https://vimeo.com/showcase/8519248/video/560049719
**Github**
https://github.com/oduwsdl/MemGator
https://github.com/oduwsdl/MementoMap
[IIPC WAC 2022: SESS…
-
@giancarlobi this is a place holder for the formatter need.
Right now its acting on WARC files, now that we have automatic WARC to WACZ transformations we need to adapt code to react to the fact th…
-
*What I wanted:* WARC copy of all pages on http://www.chilton-computing.org.uk/.
*What I expect:* A generated WARC of the root of the site and all of its descendant pages.
*What happened:* The …
-
I needed a small warc file for testing, so I took a regular wget download and picked a few files that interconnected and used warcit to create the warc file. When I looked at it in Replayweb.page ther…
-
We have three things which can stop the crawler in the middle of a run:
- `--sizeLimit`: the maximum warc size
- `--timeLimit`: the maximum duration of the crawl
- `--diskUtilization`: the maximum …