-
The Twitter API has hydration that turns message IDs into full tweets. Likewise SolrWayback should be able to take a PWID (by upload) and export a full WARC with the resources.
-
### ArchiveWeb.page Version
v0.11.3
### What did you expect to happen? What happened instead?
When downloading WARC 1.1 and ingesting them in SolrWayback via UKWA warcindexer I expected to have rep…
-
Sorry for asking a question not related to grab-site, but I don't really know where should I ask it.
I archived a big forum that recently went down. Unfortunately without the search function it's a…
-
[example file](https://kiska.b-cdn.net/omESX/example.warc.gz)
```python
f = warc.open("example.warc.gz")
for record in f:
print record['WARC-Target-URI'], record['Content-Length']
```
### E…
-
If for some resources the crawler encounters a ZIM file on a web property, we should immediately block it so that it is not included inside the WARC and then inside the ZIM.
This is probably a page…
-
Likely critical but might not be available via Chrome's webRequest API.
**Heritrix 3.2.0**
``` sh
WARC/1.0
WARC-Type: request
WARC-Target-URI: http://matkelly.com/
WARC-Date: 2015-12-11T13:25:07Z
WA…
-
Thanks for this elegant example of how to do RAG with WARC data! I also very much appreciated how the [blog post](https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring…
edsu updated
9 months ago
-
Maybe there is a Pywb option for this. Since we are storing the WARCs compressed either way there is not much reason to have another layer of compression
-
I have many saved pages, all of them are in MHT format i want to convert it to WARC, But when i click generate warc it do nothing please make it possible!
-
Related to #64:
When writing a record into an archive one can read out its content via the streaming API and write the record (now with partial or empty content) into the archive resulting the loss…