Open unixdj opened 9 years ago
A quick bechmark shows that the following moification of step_search() seems to improve compression ratio:
match = find_match(here);
if (match.found && find_match(here + 1).len <= match.len) {
emit_backref();
else
emit_literal();
(here + 1
seems to work better than here + break_even_point / 9
for some reason.)
Time to code it without calling find_match() twice as much. It's also completely meaningless when backref encoding is not longer than literal encoding.
I haven't had much time to play with it lately, but here's an implementation of the above: https://github.com/atomicobject/heatshrink/compare/master...unixdj:look-forward?expand=1 It's still really slow. Really really slow. But compression ratio is higher.
Reducing the break even point, as done in f533992, should theoretically improve compression ratio. However, in some cases emitting a literal or two followed by a backref would make the encoding overall shorter than two shorter backrefs that the (naturally greedy) encoder finds. E.g., when compressing
kennedy.xls
from the Canterbury Corpus with -w 8 -l 7 with different break even points, one can observe many such occurences (-
is break even point 3,+
is 2):It would be nice to try to find a way to optimize this within the contraints of an embedded environment. If possible, without backtracking.