-
I have WARC files collected with node-warc 3.1.0 that can not be opened in Webrecorder player (No pages found). The only discerning characteristic is that the files are archived from Facebook posts wi…
-
There was a suggestion that the extension include a function to be able to save the currently viewed website as a Web ARChive (WARC) file locally on the user's computer. This could be a feature for a …
-
warcio uses a default `Content-Type` value for WARC records of `application/warc-record`. This MIME type is not documented or specified anywhere; the WARC spec only mentions `application/warc` as the …
-
**URL**: https://replayweb.page
**Browser / Version**: Firefox 127.0
**Operating System**: Windows 10
**Tested Another Browser**: Yes Chrome
**Problem type**: Something else
**Description**: htt…
-
Hi!
When I found out about this project, its name made me think it was a tool to read [WARC files](https://en.wikipedia.org/wiki/Web_ARChive), which stands for... Web ARChives!
Is there support …
-
I'm not sure if this is a feature request or just a request for clarification, but I'm looking for a canonical way to generate a WACZ file from multiple WARC files.
I am dealing some web collection…
-
Is there a config option for splitting out downloaded files into their own warc files instead of going into the same one?
This will allow for easier data extraction based on individual items
-
- [ ] WARC to CAR library
- [ ] WACZ to CAR library (with embedded WARC chunking_)
- [ ] Add code to export WACZ from crawler to CAR
- [ ] Upload CAR to IPFS with auto-js-ipfs
- [ ] Look at splitt…
-
Motivation:
* To allow the recording of messages using a different representation to their wire message format as
- the write protocol may be suboptimal for the purposes of storage and replay; or
…
-
here I want to split warc file to small chunks and then use `multiprocessing` in python
for text file, we can use `seeks`, but how to seek in **warc module or .gz warc files** ??
any advices ?