gaenserich / hostsblock

an ad- and malware-blocking script for Linux
https://github.com/gaenserich/hostsblock
225 stars 28 forks source link

Sort, add compression to hostsblock.db #17

Closed gaenserich closed 9 years ago

gaenserich commented 9 years ago

The hostsblock annotation file (by default /var/lib/hostsblock.db), which tracks which blocklists contain which entries (so that users can be informed which blocklist may contain overly-aggressive entries when using hostsblock-urlcheck) gets very large as just a plain text file (currently 157M on my box), and is also not in a sensible order (each pass of the target file compilation loop just appends to the same file). To make this file smaller and more human-readable, hostsblock should compress (with gzip/pigz, so as to just re-use the same mechanism used to compress previous target hosts files) and sort this file. Ideas on implementation: hostsblock.sh: if gzip/pigz is detected, just "sort -u hostsblock.db | gzip/pigz -9dc > hostsblock.db.gz && rm hostsblock.db" etc. hostsblock-urlcheck: have a conditional to select either hostsblock.db.gz or fall back to hostsblock.db, add a gzip/pigz to the end of the write procedures for detected file.

pickfire commented 9 years ago

What about xz?

gaenserich commented 9 years ago

Gzip is preferred at this point because it is more widely supported and gives us faster lookup times (on such a large file, zgrep will be a lot faster than xzgrep). We could add xz in the future (post 1.0), but at the moment gzip has many more advantages for this sort of use. Am 24.04.2015 01:13 schrieb "Ivan Tham" notifications@github.com:

What about xz?

— Reply to this email directly or view it on GitHub https://github.com/gaenserich/hostsblock/issues/17#issuecomment-95804441 .