-
Is it in scope to have it be possible to archive:
- an entire blog (all posts) by only passing the root url?
- have this archive process only be additive? (even if posts are later deleted, I can b…
-
Name: Billion-Dollar Weather and Climate Disasters
Organization: NOAA
Description URL: https://www.ncdc.noaa.gov/billions/
Download URL:
File Types:
Size:
Status:
-
warcio raises
`warcio.exceptions.ArchiveLoadFailed: Invalid WARC record, first line: WARC-Type: response`
at the second WARC record in a WARC file written with [ArchiveSpark](https://github.com/helg…
-
### Provide a screenshot and describe the bug
I'm having problems on my server caused by lot of `chromium-browser` processes. It basically makes my server to run out of memory and swap a lot, see `ht…
-
Related to #60.
When replaying a URI-M whose header and payload are accessed through another node via IPFS, the header and payload will eventually get garbage collected from the local system per ht…
-
I wanted to attempt to see why #432 was happening.
I am still new to it but I wonder if a small introduction to how this is structure and where to look can help.
Also I tried at least debugging …
-
I've been working on a search engine for my personal archive, and I think it could be a good fit for IIAB.
Unfortunately I'm not an IIAB user. I'd like some people to install the project on their I…
-
Hi,
I'm trying to test the basic scenario of capturing a URL, and save it as a WARC file.
The code below kind of works, the file is created (see [gist](https://gist.github.com/ktorn/c59391d1867e…
ktorn updated
3 months ago
-
Copied from: https://github.com/huggingface/datasets/issues/3704
As mentioned in the comments, potentially related to: #15
The only way that I got a simple `wc -w` on the raw texts from git-lfs …
-
### Browsertrix Version
v1.11.2-ed9038f
### What did you expect to happen? What happened instead?
Some files not being downloaded correctly with API.
I've even tried downloading them by inserting …