-
youzim.it run of https://archives.nyphil.org/ failed reporting lots of unrecognized chars.
Task is [here](https://farm.youzim.it/pipeline/3cd41b6b-2d81-4acb-8948-a6820c5fa07f).
Command used:
``…
-
### Browsertrix Version
v1.11.3-12f994b
### What did you expect to happen? What happened instead?
When you download wacz files using the API you get wacz filenames like "20230225142507561-manual-20…
-
The package should provide facilities to write warc.gz and CDX file pairs, and to append to already existing WARC/CDX pairs (see wpull --warc-append). Should also support uncompressed WARC files with …
-
We have two broken metadata.zst files, zstdcat returns "Read error (39) : premature end ".
Logs for processing these are on NIRD:
two/log_html/archivebot_partial_logs/114.stderr
two/log_html/archiv…
-
Hi,
When I ran the following command to download the dataset from hugginigface hub, I encountered an error:
My command:
```
from datasets import load_dataset
ds = load_dataset("mlfoundation…
-
Hi!
First of all, thank you for writing this, it's very useful!
It looks like it has an issue parsing the wget-created .warc.gz files I give it, though:
Traceback (most recent call last):
File ".…
-
Environment:
Apple M1 Pro, macOS 14.3.1, Chrome
I initially uploaded 41 WARC files into WARCgpt. Among these files was an email containing titles and links to several papers related to AI. When I …
-
Hello,
I would like to know is it possible to make NAS generate warc files in the standard format ?
Actually we have our own archive platform and it works only with warc/arc files in standard format…
nasry updated
6 years ago
-
When reading a warc file that contains 'Set-Cookie' header and there are multiple cookies present on subsequent lines, the parsing logic breaks the line on the first colon, which appears to be fine fo…
-
Instead of this:
```python3
class ArchiveResult:
path = field.CharField(...)
ArchiveResult(path='./archive/warc/somefile.warc.gz')
```
We should be doing this:
```python3
class ArchiveRe…