warc Search Results - Githubissues

1000+ results
for warc

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

uga-libraries/web-aip #28

Use existing WARCs when restart

If the API times out or the script breaks in the middle of creating an AIP, it currently has to be deleted before the script runs again in order for it to be correctly finished. For AIPs with a lot of…

amhanson9 updated 1 year ago
1
phiresky/warc-sqlite #1

Import fails if record does not include the WARC header WARC…

The import fails with `sqlite3.IntegrityError: NOT NULL constraint failed: payloads.hash` if a WARC record does not include a WARC-Payload-Digest. This is the case for record types which are not suppo…

sebastian-nagel updated 2 years ago
1
webrecorder/webrecorder-player #77

Support directly replay of URLs that are not pages (was: Pro…

I have WARC files collected with node-warc 3.1.0 that can not be opened in Webrecorder player (No pages found). The only discerning characteristic is that the files are archived from Facebook posts wi…

peterk updated 5 years ago
2
internetarchive/wayback-machine-webextension #546

Allow users to create Web ARChive (WARC) files

There was a suggestion that the extension include a function to be able to save the currently viewed website as a Web ARChive (WARC) file locally on the user's computer. This could be a feature for a …

cgorringe updated 1 year ago
10
oduwsdl/ipwb #828

WARCs with 3600 new records fail to fully index

I am attempting to index [a WARC from Archive-It](https://matkelly.com/IA/ARCHIVEIT-2349-ANNUAL-KBAWJW-20110217001046-00000-crawling113.us.archive.org-6682.warc) using ipwb from the current master bra…

machawk1 updated 4 months ago
2
datatogether/sentry #13

Should WARC records on the distributed web default to a flat…

I'm seeking feedback on a decision regarding setting a _sensible default_ for writing WARC records to the distributed web. It has implications for de-duplication between archives, and might also have …

b5 updated 7 years ago
4
webrecorder/warcio #40

error checking around record creation?

Given [this whitespace-related header bug](https://github.com/commoncrawl/nutch/issues/5) that crept into the August 2018 Common Crawl crawl , it would be nice if it was somewhat difficult to create b…

wumpus updated 5 years ago
4
netarchivesuite/solrwayback #432

CVS/JSON export (probably also other export) do not use grou…

The properties in solrwaybackweb.properties: export.csv.maxresults=10000000 export.warc.maxresults=1000000 export.warc.expanded.maxresults=10000 Are used to stop too large export. But the count …

thomasegense updated 6 months ago
1
yujiosaka/headless-chrome-crawler #118

[Feature Request] Add support for WARC file format

[WARC](http://iipc.github.io/warc-specifications/) is well-known format for storing crawled captures. It can store arbitrary number of HTTP requests and responses along with other network interactions…

ibnesayeed updated 6 years ago
4
N0taN3rd/node-warc #34

WARC Parsing sometimes results in truncated records.

The WARC parsing sometimes results in records being truncated. This might be due to the parser continuing to look for newlines/read one line at a time, even when parsing the content body, and might…

ikreymer updated 4 years ago
7

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for warc

1000+ results
for warc