-
If one is recording a site of video content (especially video content which repeats upon, say, a reload or clicking on the link again), the files become huge. Having the ability to intelligently dedup…
-
Hi there,
first of all thank you very much for this great tool and the idea of the wacz format so simplify the handling of warc files.
I'm not sure if this is the correct project to report this to…
-
I wanted to reach out to the frictionless data community share that we (https://github.com/webrecorder, https://webrecorder.net/) are working on a new packaging format to store web archive data, and a…
-
# What? How?
Well https://www.busybox.net adds in small footprint a lot of basic utilities to a Docker based on Alpine.
But guess what?
They are "not" exactly the same as the standard GNU.
E…
-
Support text extraction from the DOM, using existing approaches implemented here:
https://github.com/webrecorder/archiveweb.page/blob/main/src/recorder.js#L1061
https://github.com/webrecorder/browse…
-
Hi @ikreymer @emmadickson
I know you guys are busy with WACZ but wanted to catch up with some issues we have been having on the embed version of replay web on Archipelago with version 1.2
I sus…
-
I came across [perma.cc](https://perma.cc/) today and was wondering if it could be useful for the Programming Historian to ensure its weblinks are more 'permanent.'
After chatting with @walshbr an…
hawc2 updated
2 years ago
-
I am trying to load files that are between 273 MB and 2 GB. I am using a Mac. However, the files stall halfway through when loading. I do not have this issue with small files. These are WARC files…
-
Hi,
Some days ago I created a WARC file with Heritrix. Webrecorder Players discovers around 10.000 pages; replayweb 0. There certainly are pages and URL's in that WARC-file. Is this a bug? Or maybe…
-
First, thanks a lot for publishing this extension, it makes archiving much more straightforward.
I tried to download a Web Archive totalling 2.56 GB as a `wacz` file. The download starts but then g…