N0taN3rd / Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
https://n0tan3rd.github.io/Squidwarc/
Apache License 2.0
168 stars 26 forks source link

Make option for gzipped WARC:s? #36

Closed peterk closed 5 years ago

peterk commented 5 years ago

feature request: Make option for gzipped WARC:s.

Not sure if this should be a feature or if users should gzip WARC files outside of Squidwarc/node-warc?

What is the current behavior?

The WARCS created are currently not gzipped.

N0taN3rd commented 5 years ago

Untill I make Squidwarc aware of this (shortly) you can set the env var NODEWARC_WRITE_GZIPPED (just needs to exist) and the warcs will be gziped but no warc.gz ext