-
Hi,
Some days ago I created a WARC file with Heritrix. Webrecorder Players discovers around 10.000 pages; replayweb 0. There certainly are pages and URL's in that WARC-file. Is this a bug? Or maybe…
-
First, thanks a lot for publishing this extension, it makes archiving much more straightforward.
I tried to download a Web Archive totalling 2.56 GB as a `wacz` file. The download starts but then g…
-
There are a handful of other ways to get a compressible outer container to do what the zip file is currently doing in the draft spec. This might be preferable to a largely-uncompressed (STORE mode) zi…
-
Set up Django storages, so capture jobs can save to disk in dev, but upload to s3 in prod. Start from https://github.com/harvard-lil/perma/blob/develop/perma_web/perma/storage_backends.py
-
Our next 1h community hangout to chat about recent developments in Frictionless and data generally and next steps, involving our community members in the process!
Organizers: @lwinfree (OKF) & @sgl…
-
I know we have talked about that in the past... but it the outcome of the discussion is unclear to me.
-
WACZ has many different ways of encoding the same information. This means everyone implementing the format needs to pick one when writing and has the burden of supporting all possibilities when readin…
-
Another possibility is to base this spec on a Frictionless data package: https://specs.frictionlessdata.io/data-package/
Pros: Would help with interoperability, and avoid creating a whole format fr…
-
My own use case for this format involves making large, multi-GB WARC files available for users to browse in their web browser without them having to download the entire archive first.
I first used …
ghost updated
4 years ago
-
Or should spec be kept small, and allow for extension as needed?
The initial focus is replay, but perhaps other use cases have other requirements that are shared and could use a standardized approa…