warc-files Search Results

1000+ results
for warc-files

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

openzim/zimit #299

Invalid WARC Record

Opening here for you to triage ; run [67615](https://farm.zimit.kiwix.org/pipeline/67615c43-0078-483f-a016-d14ed92cfc8f/debug) failed when warc2zim tried to load one of the WARC ``` Processing WAR…

rgaudin updated 2 months ago
1
datatogether/sentry #13

Should WARC records on the distributed web default to a flat…

I'm seeking feedback on a decision regarding setting a _sensible default_ for writing WARC records to the distributed web. It has implications for de-duplication between archives, and might also have …

b5 updated 7 years ago
4
hrbrmstr/warc #1

Too many open files when mapping more than 509 pages

That is a great pleasure working with warc, however I'm experiencing error when mapping larger mount of files. It seems like the connections to the files are not closed. Please find below the reproduc…

trotsiuk updated 6 years ago
3
machawk1/wail #106

Associate WARC files on the system with WAIL

via AppleEvents. https://github.com/pyinstaller/pyinstaller/pull/50 seems to imply that it's supported but does not give points to how to specify this at compile time. Also, http://superuser.com/ques…

machawk1 updated 5 years ago
4
commoncrawl/news-crawl #34

Add HTTP protocol version to HTTP request message

The request records in the CC-NEWS WARC files lack the HTTP protocol version: ``` GET /path ``` instead of ``` GET /path HTTP/1.1 ``` This makes some WARC parsers fail to process the WARC fil…

sebastian-nagel updated 4 years ago
1
ArchiveBox/ArchiveBox #1518

Bug: wget fails on `https://user:pass@domain/` URLs using HT…

#### Describe the bug archivebox update shows ``` > wget Extractor failed: …

agowa updated 1 week ago
9
bibanon/BASC-WARC #1

Possible uses in Website Reconstruction

One important alternate application of this library would be to export data from the WARC files, to output HTML and other metadata. For example, the Internet Archive has the only snapshots of 4chana…

antonizoon updated 9 years ago
4
Rhizome-Conifer/conifer #475

How to transform WARC files to be a text?

Excuse me, if i want to transfrom warc file to be a text,do you have any tools in the source or any apis can help me more easier to make the transformation?

Godbother updated 6 years ago
3
internetarchive/warcprox #179

Seperate WARC file for each request

Hi, Is there a way to create a separate WARC file for each request? eg: I have 2 browsers, both using warcprox as a proxy Browser 1 sends a request to `google.com`. Browser 2 sends a request to…

Yakabuff updated 1 year ago
1
webrecorder/awp-sw #2

Generate unique WARC file names when creating WACZ files

I was looking at WACZ files generated from ReplayWeb.page and noticed that the WARC file under the `archive/` folder is always named `data.warc.gz` (or `data.warc`). It looks like the file name is har…

ibnesayeed updated 1 year ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for warc-files

1000+ results
for warc-files