Closed Clownacy closed 6 years ago
I find it doubtful that adding a Saxman-style fill will save that much space; the compressor should be able to compress such sequences by inlining the first byte of the sequence, then doing an overlapping match for the rest. The space cost would be 9 bits compared to the Saxman fill.
The two format changes that make things easier for the decompressor are interesting, and I will be looking into them when I come back.
The decompressor size improvement, I fear it might not possible; I want to say that there is no way to guarantee that the length bits are written after the distance byte, but I will have to think more carefully.
Remind me in a week or so when I am back from my vacation.
In my modified compressor, I just shuffled these around:
// Inline dictionary match.
out.descbit(0);
out.descbit(0);
out.putbyte(-dist);
len -= 2;
out.descbit((len >> 1) & 1);
out.descbit(len & 1);
break;
I don't see how a byte/bit ordering issue would happen.
After thinking about quite a bit, I agree — what you describe will work. Working on it now.
For reference, I implemented the Saxman-style zero fill (and your suggested general byte fill) for testing. As I suspected, the gains were less than stellar:
$ kosplus LZ.unc LZ.kospz
$ stat -c '%-12n %s' LZ.*
LZ.kos 9414
LZ.kosp 9414
LZ.kospz 9410
LZ.orig.kos 10224
LZ.unc 41984
For a grand total of 4 bytes in LZ's chunks compared to simply using the original encoder. I did get a much better result at first when using the Saxman code directly because I forgot to set the proper edge weight.
Strange. I changed the edge weight handling too, like this:
constexpr static void extra_matches(stream_t const *data,
size_t basenode,
size_t ubound, size_t lbound,
LZSSGraph<KosPlusAdaptor>::MatchVector &matches) noexcept {
ignore_unused_variable_warning(lbound);
// Can't encode zero match after this point.
if (basenode >= (SearchBufSize - 1)) {
return;
}
// Try matching zeroes.
size_t jj = 0;
while (data[basenode + jj] == 0) {
if (++jj >= ubound) {
break;
}
}
// Need at least 2 zeroes in sequence.
if (jj >= 2) {
// Got them, so add them to the list.
matches[jj - 1] = AdjListNode(basenode + jj,
basenode + 1,
jj, dictionary_weight(basenode + 1, jj));
}
}
The file seemed to load fine afterwards.
Just to confirm: you are testing with S1's map256/LZ.bin file when you say you got a 0x300-byte improvement, right? I tried with basically your code above, but adapted to the newer version of the code, and I got only a 4-byte improvement over stock Kos+.
...Dammit. I was using a .kosp file I compressed with KENSSharp, which doesn't use LZSS graphs. Sorry.
No problems. I guess I will close this for now, and leave the byte fill for another, related, format.
I think I have a few ideas for changing Kosinski+'s format. Most increase performance without affecting size, but one improves size at the cost of performance.
For the size improvement, I suggest implementing Saxman's zero-fill mode. Normally this only saves a few bytes per file, but LZ's chunks see an improvement of 0x300 bytes, so it does have its moments. I've thought about extending this, so that it can encode runs of more numbers than zero. The way it would be done is by calculating how much the start of the dictionary is overshot. So, for example, if the distance reaches past the start of the dictionary by 2 bytes, it would encode a series of 1's, and if it's 3 bytes, it would write 2's, etc. However, I'm not sure how to implement this in the compressor, so I've been unable to test it.
The performance improvements are to do with making things easier for the decompressor. Here's the one I'm using.
Firstly, the high/low bytes in the long dictionary references matches are swapped, allowing the decompressor to do this:
...instead of this:
Next, the length bits in the long dictionary references are negated. This lets the decompressor skip the
eor
. This requires thejmp .mediumcopy(pc,d4.w)
be changed tojmp .mediumcopy-2(pc,d4.w)
, since 0 is still reserved for triggering extended mode.The last change is more of a size improvement for the decompressor than anything: if you write the distance byte before writing the length bits, you can add the displacement to a5 before jumping to the various
.CopyXX
blocks. It also lets you neatly collapse those blocks into one bit of code:(Zero-fill code trimmed for brevity)