-
Hi,
I'm looking for a way to integrate with a library of low poly models offline (for privacy concerns and to give reliable workshops) to facilitate in VR content creation. For now I have manually …
-
# Summary
Can someone please point me to the best approach for dumping screenshots generated by a browsertrix crawl to a directory of image files? Thank you in advance :)
# Background
I am at…
-
i only want files, not warc.
can grab-site output regular files (like html and images) for me like wget can? (links must be converted to relative links)
side question: has anyone here actually …
ftc2 updated
3 months ago
-
- Avoid parsing entirety of warc file
- Don't parse http records inside
Any improvements we can make to mean that large and gargantuan warc files can be read and processed speedily
-
When reading WARC files compressed with gzip, many of the entries contained are skipped or misread. To reproduce, use common crawl data in .gz format, count the number of entries found by the WARC lib…
-
### ArchiveWeb.page Version
v0.11.3
### What did you expect to happen? What happened instead?
When downloading WARC 1.1 and ingesting them in SolrWayback via UKWA warcindexer I expected to have rep…
-
I'm trying to upload files captured with webrecorder to conifer, as webrecorder doesn't seem to have such a difficulty accessing facebook. I dowloaded the .wacz and unzipped the file and tried upload…
-
Some crawlers could create multiple WARC files, it's importand if we had to upload WARC files to storages with limitation on single file size. I have a lot of archives websites splitted to 5-50 5GB WA…
ivbeg updated
2 years ago
-
I am using the local executor. My machine has 48 Cpus with 348 Ram. Any idea how to speed this up? Currently one single task (task=1, running for 1 warc.gz file, with size ~1g) takes half an hour. Thi…
-
Being able to index and re-index collections that are located on remote storage (S3) would be very helpful.