warc Search Results - Githubissues

1000+ results
for warc

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

netarchivesuite/solrwayback #172

Export full WARC from PWID

The Twitter API has hydration that turns message IDs into full tweets. Likewise SolrWayback should be able to take a PWID (by upload) and export a full WARC with the resources.

tokee updated 1 year ago
1
webrecorder/archiveweb.page #211

[Bug]: Downloaded WARC-files doesnt have a .gz extension but…

### ArchiveWeb.page Version v0.11.3 ### What did you expect to happen? What happened instead? When downloading WARC 1.1 and ingesting them in SolrWayback via UKWA warcindexer I expected to have rep…

Klindten updated 6 months ago
4
ArchiveTeam/grab-site #166

Make WARC files searchable

Sorry for asking a question not related to grab-site, but I don't really know where should I ask it. I archived a big forum that recently went down. Unfortunately without the search function it's a…

Svekla updated 10 months ago
1
internetarchive/warc #34

Unsupported WARC version: 1.1

[example file](https://kiska.b-cdn.net/omESX/example.warc.gz) ```python f = warc.open("example.warc.gz") for record in f: print record['WARC-Target-URI'], record['Content-Length'] ``` ### E…

kiska3 updated 3 years ago
2
openzim/zimit #396

Automatically ignore ZIM resources found on a website to cra…

If for some resources the crawler encounters a ZIM file on a web property, we should immediately block it so that it is not included inside the WARC and then inside the ZIM. This is probably a page…

benoit74 updated 2 months ago
1
machawk1/warcreate #79

WARC Request Record payloads are missing the 'host' header

Likely critical but might not be available via Chrome's webRequest API. **Heritrix 3.2.0** ``` sh WARC/1.0 WARC-Type: request WARC-Target-URI: http://matkelly.com/ WARC-Date: 2015-12-11T13:25:07Z WA…

machawk1 updated 7 years ago
2
harvard-lil/warc-gpt #2

Use extracted text in WARC resource records

Thanks for this elegant example of how to do RAG with WARC data! I also very much appreciated how the [blog post](https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring…

edsu updated 9 months ago
1
webis-de/scriptor #25

Decompress gzip encoding within WARC

Maybe there is a Pywb option for this. Since we are storing the WARCs compressed either way there is not much reason to have another layer of compression

johanneskiesel updated 2 years ago
1
machawk1/warcreate #116

Generate WARC from offline MHTML

I have many saved pages, all of them are in MHT format i want to convert it to WARC, But when i click generate warc it do nothing please make it possible!

johnss updated 5 years ago
2
webrecorder/warcio #90

Do not allow writing records which content_stream() has been…

Related to #64: When writing a record into an archive one can read out its content via the streaming API and write the record (now with partial or empty content) into the archive resulting the loss…

dlazesz updated 4 years ago
4

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for warc

1000+ results
for warc