-
Opening here for you to triage ; run [67615](https://farm.zimit.kiwix.org/pipeline/67615c43-0078-483f-a016-d14ed92cfc8f/debug) failed when warc2zim tried to load one of the WARC
```
Processing WAR…
-
I'm seeking feedback on a decision regarding setting a _sensible default_ for writing WARC records to the distributed web. It has implications for de-duplication between archives, and might also have …
-
That is a great pleasure working with warc, however I'm experiencing error when mapping larger mount of files. It seems like the connections to the files are not closed. Please find below the reproduc…
-
via AppleEvents. https://github.com/pyinstaller/pyinstaller/pull/50 seems to imply that it's supported but does not give points to how to specify this at compile time.
Also, http://superuser.com/ques…
-
The request records in the CC-NEWS WARC files lack the HTTP protocol version:
```
GET /path
```
instead of
```
GET /path HTTP/1.1
```
This makes some WARC parsers fail to process the WARC fil…
-
#### Describe the bug
archivebox update shows
```
> wget
Extractor failed: …
-
One important alternate application of this library would be to export data from the WARC files, to output HTML and other metadata.
For example, the Internet Archive has the only snapshots of 4chana…
-
Excuse me, if i want to transfrom warc file to be a text,do you have any tools in the source or any apis can help me more easier to make the transformation?
-
Hi,
Is there a way to create a separate WARC file for each request?
eg: I have 2 browsers, both using warcprox as a proxy
Browser 1 sends a request to `google.com`. Browser 2 sends a request to…
-
I was looking at WACZ files generated from ReplayWeb.page and noticed that the WARC file under the `archive/` folder is always named `data.warc.gz` (or `data.warc`). It looks like the file name is har…