emmanuel-marty / unzx0_68000

Free, zlib licensed ZX0 decompressor for the 68000
zlib License
18 stars 4 forks source link

Hypothetical bug for literals or match copies >64 KB #1

Open chrisly42 opened 1 year ago

chrisly42 commented 1 year ago

Hi there,

thank you for your implementation of the zx0 decompressor. Much appreciated!

I just noticed that both .copy_lits and .copy_match are using dbra for looping over the literal/match length in d0, which is assumed to be a long word as can be seen from the preceding subq.l #1,d0 instruction and the addx.l d0,d0 in .get_elias . However, dbra only uses word size for d0, and thus the decoder would not work correctly if a block has a length >65535.

I know this is a very unlikely thing to happen except for storing large blocks of random noise or e.g. very large duplicated or empty blocks of data. But if this edge case is unsupported anyway, the operations on d0 could be cut down to word size, halving the number of cycles on 68000 for these operations.

An easy fix is to move the subq.l #1,d0 inside the loop and use a bne.s instead of dbra at the cost of performance (same code size).

More complicated solutions would keep the dbra and test the upper word of d0 for zero. For example:

    swap d0
    tst.w d0
    beq.s .copy_lits_done
    swap d0
    bra.s .copy_lits
.copy_lits_done

or

.copy_lits_next_64k
        subq.l  #1,d0           ; dbf will loop until d0 is -1, not 0
.copy_lits
        move.b  (a0)+,(a1)+     ; copy literal byte
        dbra    d0,.copy_lits   ; loop for all literal bytes
        addq.l  #1,d0
        bne.s .copy_lits_next64k

(I do like the latter solution, it looks elegant and only adds 2 * 4 bytes to your version and 16 cycles per block).

chrisly42 commented 1 year ago

Argh, sorry, the lower solution does not work as intended, it should be

        addq.w #1,d0
        tst.l   d0
        bne.s .copy_lits_next64k

That's a little bit less elegant :-/