Closed neophob closed 6 years ago
That output is produced by feature #48 introduced in v5.0. Those substrings are candidates patterns- meaning that they were considered to be replaced by 1-character tokens, but not selected since that would come at a loss, or just break even.
This section is intended as a help for the user to improve compression. For instance, if you have s*
somewhere in your code, you could swap the operands to have an extra instance of *s
which would then get compressed and gain 1 byte.
The line giving you this information is :
val=7, gain=0->1 (+1), N=4, str = *s
In detail, it means that there are 4 instances of *s
, and that replacing them with a token is useless (gain = 0). However if you could have an extra instance of the same substring, the compression would then bring a net gain of 1 byte.
As for merging 16
and 16,
, your mileage may vary : you have 4 instances of the former and only 2 of the latter, so keeping the shortest one may or may not be the optimal solution. The gain upon packing depend on both string length and occurrence count, you can influence the algorithm by tweaking the score formula (default 1/0/0).
thanks @Siorki now that make sense!
about the formula - today i use a script which produce several regpack outputs with different settings. what about a brute force attempt to find the best compression? think thats feasible?
Feasible yet difficult - that's exactly the point of issue #7
thanks!
token detection can be improved, here's a sample output of my project:
this means:
var
andvar
exist -> should be combined asvar
- maybe combine it even withar
16
and16,
exist -> should be combined as16
*s,
and*s