Closed gaenserich closed 9 years ago
What about xz?
Gzip is preferred at this point because it is more widely supported and gives us faster lookup times (on such a large file, zgrep will be a lot faster than xzgrep). We could add xz in the future (post 1.0), but at the moment gzip has many more advantages for this sort of use. Am 24.04.2015 01:13 schrieb "Ivan Tham" notifications@github.com:
What about xz?
— Reply to this email directly or view it on GitHub https://github.com/gaenserich/hostsblock/issues/17#issuecomment-95804441 .
The hostsblock annotation file (by default /var/lib/hostsblock.db), which tracks which blocklists contain which entries (so that users can be informed which blocklist may contain overly-aggressive entries when using hostsblock-urlcheck) gets very large as just a plain text file (currently 157M on my box), and is also not in a sensible order (each pass of the target file compilation loop just appends to the same file). To make this file smaller and more human-readable, hostsblock should compress (with gzip/pigz, so as to just re-use the same mechanism used to compress previous target hosts files) and sort this file. Ideas on implementation: hostsblock.sh: if gzip/pigz is detected, just "sort -u hostsblock.db | gzip/pigz -9dc > hostsblock.db.gz && rm hostsblock.db" etc. hostsblock-urlcheck: have a conditional to select either hostsblock.db.gz or fall back to hostsblock.db, add a gzip/pigz to the end of the write procedures for detected file.