-
This question is about [IPv6 address representation](https://en.wikipedia.org/wiki/IPv6_address#Representation) in WARC captures.
- refers to RFC4291, and
- says that the form (`x:x:x:x:x:x:x:x`) …
-
In this function we currently re-open the WACZ each time we request a WARC record. When using a cloud provider we keep fetching the file. Also when not using a cloud provider we should not need to re-…
-
-
Motivation:
* To allow the recording of messages using a different representation to their wire message format as
- the write protocol may be suboptimal for the purposes of storage and replay; or
…
-
I noticed the parser queue for the 2020 historical reingest slowing down, and parser exits (show by dots on the "app max run time" granfana graph). `docker ps -a` showed exited parser containers, and…
-
Received the following error more than once:
```
panic: open jobs/warcs/TCPK-20240826191443122-00001-crawl918.us.archive.org.warc.gz.open: too many open files
goroutine 119 [running]:
github.com…
-
Hi,
When I ran the following command to download the dataset from hugginigface hub, I encountered an error:
My command:
```
from datasets import load_dataset
ds = load_dataset("mlfoundation…
-
Opening here for you to triage ; run [67615](https://farm.zimit.kiwix.org/pipeline/67615c43-0078-483f-a016-d14ed92cfc8f/debug) failed when warc2zim tried to load one of the WARC
```
Processing WAR…
-
Hi `@ll`.
I have been using **`monolith`** more and more for webpage capture but couldn't find a way to make downloads in **WARC format** (as documented at ).
I believe such an option would gre…
-
I was surprised that example provided in documentation:
``` python
>>> import warcat.model
>>> warc = warcat.model.WARC()
>>> warc.load('example/at.warc.gz')
>>> len(warc.records)
```
Reads everythi…
sirex updated
1 month ago