Closed Dri0m closed 2 years ago
You can use --wpull-args=--no-warc-compression
to do this, by the way.
That's good to know, thanks! I guess this issue is solved then.
Thanks ethus3h, I'll probably just document that in the README.
Also note that request/response records in .warc.gz files are individually compressed, and if you plan to ever send them to Internet Archive, I believe they expect them to be compressed that way. Running gzip
on an uncompressed .warc will not compress the records individually, so random access will not work.
Now documented in 6269289a2ca874bae52f116016ca54dc8887d0cc
The compression is unwanted e.g. when i'm scraping on a drive with filesystem compression, or when I want to use a strong compression algo after i'm done scraping.