thank you for your implementation of the zx0 decompressor. Much appreciated!
I just noticed that both .copy_lits and .copy_match are using dbra for looping over the literal/match length in d0, which is assumed to be a long word as can be seen from the preceding subq.l #1,d0 instruction and the addx.l d0,d0 in .get_elias . However, dbra only uses word size for d0, and thus the decoder would not work correctly if a block has a length >65535.
I know this is a very unlikely thing to happen except for storing large blocks of random noise or e.g. very large duplicated or empty blocks of data. But if this edge case is unsupported anyway, the operations on d0 could be cut down to word size, halving the number of cycles on 68000 for these operations.
An easy fix is to move the subq.l #1,d0 inside the loop and use a bne.s instead of dbra at the cost of performance (same code size).
More complicated solutions would keep the dbra and test the upper word of d0 for zero. For example:
Hi there,
thank you for your implementation of the zx0 decompressor. Much appreciated!
I just noticed that both
.copy_lits
and.copy_match
are usingdbra
for looping over the literal/match length ind0
, which is assumed to be a long word as can be seen from the precedingsubq.l #1,d0
instruction and theaddx.l d0,d0
in.get_elias
. However,dbra
only uses word size for d0, and thus the decoder would not work correctly if a block has a length >65535.I know this is a very unlikely thing to happen except for storing large blocks of random noise or e.g. very large duplicated or empty blocks of data. But if this edge case is unsupported anyway, the operations on
d0
could be cut down to word size, halving the number of cycles on 68000 for these operations.An easy fix is to move the
subq.l #1,d0
inside the loop and use abne.s
instead ofdbra
at the cost of performance (same code size).More complicated solutions would keep the dbra and test the upper word of d0 for zero. For example:
or
(I do like the latter solution, it looks elegant and only adds 2 * 4 bytes to your version and 16 cycles per block).