agn453 / UNZIP-CPM-Z80

UNZIP and ZIP for CP/M Z80
The Unlicense
34 stars 4 forks source link

OPT proposal #10

Closed zx70 closed 11 months ago

zx70 commented 12 months ago

; push de ; ld de,-4 ; add iy,de ; pop de

dec iy
dec iy
dec iy
dec iy

; push de ; ld de,4 ; add iy,de ; pop de

inc iy
inc iy
inc iy
inc iy

...why it was preferred the addition ? For the flags?

zx70 commented 12 months ago

I'd also suggest to use HL' for bleft and bitbuf

zx70 commented 12 months ago

I created a pull request with those code parts I was able to test. Using EXX seems more tricky than expected and probably should stay as an option (we never know whether the BIOS is leaving the alternate registers untouched).

agn453 commented 11 months ago

; push de

; ld de,-4 ; add iy,de ; pop de

dec iy
dec iy
dec iy
dec iy

; push de

; ld de,4 ; add iy,de ; pop de

inc iy
inc iy
inc iy
inc iy

...why it was preferred the addition ? For the flags?

If you wish to optimise for size, the current method of PUSH, LD DE, ADD IY, POP uses 7 bytes versus 8 bytes for the INC IY or DEC IY solution. Squeezing every byte from the code was the goal at the time. In other places such as within loops, the code is optimised for speed.

zx70 commented 11 months ago

Well, thinking at ways to save memory we could, in example put redundant code parts in a subroutine, like:

open_wr: ld de,opbuf ld c,setdma call bdos call setout ld de,opfcb ld c,fwrite call bdos ret

Another possible approach is to remove (or keep optional) those compression methods which are not used anymore.

agn453 commented 11 months ago

I've merged your #12 changes and bumped the CP/M UNZIP version to v1.5-7. Thanks.

zx70 commented 11 months ago

This one is a little bit extreme, but in my test case reduces the timing count (z88dk-ticks) from 28210377 to 26158350. It can be extended to the whole "getbits" logic and probably helps in saving a little more if correctly implemented. The only problem is that we must trust the BIOS (and BDOS) not be touching the alternate registers set, which should be the case if it is well written. Otherwise we should preserve HL' before using the BDOS calls.

    ;
    nextsymbol:
        ld  (treep),hl

        exx
        ld  hl,(bitbuf) ; keep bitbuf in L, bleft in H
        exx

    nsloop:
    ;   push    hl
        exx
        ;ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
        dec h
        jp  p,$+9       ; jump to "xor a", past jp op plus 6 bytes:
        call    getbyte     ; (3 bytes)
        ld  l,a     ; (1 byte)  new bitbuf
        ld  h,7     ; (2 bytes) 8 bits left, pre-dec'd
        xor a       ; jp op above jumps here
        rr  l
    ;   ld  (bitbuf),hl ; update bitbuf/bleft
        exx
        ;ld h,a     ; A still zero
        rla         ; return bit in HL and A
        ;ld l,a

    ;   pop hl
        or  a
        jr  z,nsleft
        inc hl
        inc hl
    nsleft:
        ld  e,(hl)
        inc hl
        ld  d,(hl)

        ld  a,d
        cp  10h
        jr  nc,nsleaf
        or  e
        ;ret    z
        jr  z,nsexit

        ld  hl,(treep)
        add hl,de
        add hl,de
        add hl,de
        add hl,de
        jr  nsloop

    nsleaf: and 0fh
        ld  d,a
    nsexit:

        exx
        ld  (bitbuf),hl ; keep bitbuf in L, bleft in H
        exx
        ret
zx70 commented 11 months ago

One step further...

    ;   rd1bit
    ;   push    af
    ;
    ;   ld  a,2
    ;   call    rdbybits
    ;   or  a

        ld  a,3     ; better to gather 3 bits at once, it's faster and smaller
        call    rdbybits

        ld l,a
        and 1       ; keep the first bit
        push af

        ld  a,l     ; now onto the next 2 bits
        srl a