warc-files Search Results

1000+ results
for warc-files

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ukwa/webarchive-discovery #252

Clean up temporary files underway

It seems that calling `warc-indexer` with thousands of WARC-files causes the `tmp` folder to fill up (maybe due to DROID temporary files). It should possible to clean up underway.

tokee updated 1 year ago
5
webrecorder/browsertrix-old #42

DOM/Screenshot urls could result in improperly encoded targe…

## Describe the bug A scrape is consistently producing two WARC files that cannot be loaded by replayweb.page. I was having some issues using warcat on the files produced by this scrape, too, but I h…

jswrenn updated 4 years ago
4
whatwg/compression #42

Support for decompressing multi-member gzip files?

The GZIP spec includes support for one or more members `(A gzip file consists of a series of "members" (compressed data sets).` but this spec currently states `A gzip stream may only contain one "me…

ikreymer updated 1 week ago
2
N0taN3rd/node-warc #34

WARC Parsing sometimes results in truncated records.

The WARC parsing sometimes results in records being truncated. This might be due to the parser continuing to look for newlines/read one line at a time, even when parsing the content body, and might…

ikreymer updated 4 years ago
7
iipc/warc-specifications #74

Deprecate line folding

WARC inherited line folding from HTTP which presumably included it for compatibility with MIME messages which have line length limits. The newer HTTP RFCs [deprecated it](https://datatracker.ietf.org/…

ato updated 2 years ago
2
ukwa/webarchive-explorer #5

Explorer redirects to public URLs, does not extract versions…

This is a real n00b question. Sorry if I'm missing something obvious. I've pointed the Explorer at a set of WARC and ARC files and can get results back from my local Wayback machine query interface. …

mjordan updated 11 years ago
7
webrecorder/pywb #217

pywb timeouts on larger WARC file

We use pywb to serve a ~ 488 MB WARC file (https://webrecorder.io/layoutanalysis/2015_2016) to [scrapy](https://scrapy.org/)/[splash](http://splash.readthedocs.io/en/stable/), which injects a layout a…

fbuchinger updated 7 years ago
2
DigitalPebble/sc-warc #9

Test with super large files

Asked by Andy Jackson > Secondly, when using the WARC writer, how does it cope with large downloads? We sometimes see > 2GB files - would it handle those? Need to test the WARC writer in isolation t…

jnioche updated 8 years ago
3
chfoo/warcat #7

Feature: extract WARCs specified with index/length

In some of the mega WARCs produced by Archive Team, extracting all the WARCs to save just a few is infeasible as it can take at least 2 days to extract them all using `warcat`. One might have already…

gwern updated 8 years ago
1
StractOrg/stract #216

Can this be used to to crawl a specific website and be used …

Was wondering if it was possible to use this as a website specific search, in place of the "powered by google" search you often see. If so what would the process of setting this up look like? I did tr…

drwankingstein updated 1 month ago
2

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for warc-files

1000+ results
for warc-files