ckolivas / lrzip

Long Range Zip
http://lrzip.kolivas.org
GNU General Public License v2.0
619 stars 76 forks source link

Enhancement: x86 filter for lrzip. (Discussion) #143

Closed pete4abw closed 4 years ago

pete4abw commented 4 years ago

The benefit of using lzma filters prior to compression is debatable. Since lrzip pre-processes data using rzip, one could consider that a filter itself as it hashes data anyway. Also, rzip will pre-process most data effectively whereas x86 filter (and other related ones, i,e, SPARC, PPC, ARM) really only processes Branch, Call, and Jump (BCJ) instructions.

That said, I have run some tests on executable and library directories, and it appears that x86 filter will yield ~4% compression benefit. There are two issues that impact this.

  1. lrzip already uses rzip to pre-process data so an unknown number of BCJ instructions may already be optimized. Therefore an unknown number of BCJ instructions may already be optimized and not impacted by an x86 filter run.
  2. because of the way mmap is used, it's not possible to pre-process the data before rzip, but only after and just prior to sending to lzma.

On the plus side, the extra filtering may benefit all compression methods. Another plus is the x86 filter is very fast, since it's only scanning for certain patterns.

Thoughts?

pete4abw commented 4 years ago

see #144