warc-files Search Results

1000+ results
for warc-files

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

icosa-foundation/icosa-gallery #73

Seeding the database / .dump file clarifications

Hi, I'm looking for a way to integrate with a library of low poly models offline (for privacy concerns and to give reliable workshops) to facilitate in VR content creation. For now I have manually …

Utopiah updated 2 weeks ago
2
webrecorder/browsertrix-crawler #247

Obtaining Screenshot Image Files After Crawl

# Summary Can someone please point me to the best approach for dumping screenshots generated by a browsertrix crawl to a directory of image files? Thank you in advance :) # Background I am at…

thegrif updated 5 days ago
4
ArchiveTeam/grab-site #228

is it possible to output regular files instead of warc?

i only want files, not warc. can grab-site output regular files (like html and images) for me like wget can? (links must be converted to relative links) side question: has anyone here actually …

ftc2 updated 3 months ago
6
internetarchive/warctools #4

Streaming interface to warc files.

- Avoid parsing entirety of warc file - Don't parse http records inside Any improvements we can make to mean that large and gargantuan warc files can be read and processed speedily

tef updated 11 years ago
3
internetarchive/warc #21

.gz WARC files not properly read

When reading WARC files compressed with gzip, many of the entries contained are skipped or misread. To reproduce, use common crawl data in .gz format, count the number of entries found by the WARC lib…

MrMagoffin updated 5 years ago
11
webrecorder/archiveweb.page #211

[Bug]: Downloaded WARC-files doesnt have a .gz extension but…

### ArchiveWeb.page Version v0.11.3 ### What did you expect to happen? What happened instead? When downloading WARC 1.1 and ingesting them in SolrWayback via UKWA warcindexer I expected to have rep…

Klindten updated 5 months ago
4
Rhizome-Conifer/conifer #829

.warc files uploaded to Conifer missing

I'm trying to upload files captured with webrecorder to conifer, as webrecorder doesn't seem to have such a difficulty accessing facebook. I dowloaded the .wacz and unzipped the file and tried upload…

boringgirl1 updated 3 years ago
1
webrecorder/replayweb.page #91

Support load of multiple WARC files

Some crawlers could create multiple WARC files, it's importand if we had to upload WARC files to storages with limitation on single file size. I have a lot of archives websites splitted to 5-50 5GB WA…

ivbeg updated 2 years ago
1
huggingface/datatrove #214

Assign more cpu to single task to speed it up for local exec…

I am using the local executor. My machine has 48 Cpus with 348 Ram. Any idea how to speed this up? Currently one single task (task=1, running for 1 warc.gz file, with size ~1g) takes half an hour. Thi…

barbara-su updated 3 months ago
5
webrecorder/pywb #182

Index WARC files on external storage

Being able to index and re-index collections that are located on remote storage (S3) would be very helpful.

despens updated 4 years ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for warc-files

1000+ results
for warc-files