Closed zx70 closed 11 months ago
I'd also suggest to use HL' for bleft and bitbuf
I created a pull request with those code parts I was able to test. Using EXX seems more tricky than expected and probably should stay as an option (we never know whether the BIOS is leaving the alternate registers untouched).
; push de
; ld de,-4 ; add iy,de ; pop de
dec iy dec iy dec iy dec iy
; push de
; ld de,4 ; add iy,de ; pop de
inc iy inc iy inc iy inc iy
...why it was preferred the addition ? For the flags?
If you wish to optimise for size, the current method of PUSH, LD DE, ADD IY, POP uses 7 bytes versus 8 bytes for the INC IY or DEC IY solution. Squeezing every byte from the code was the goal at the time. In other places such as within loops, the code is optimised for speed.
Well, thinking at ways to save memory we could, in example put redundant code parts in a subroutine, like:
open_wr: ld de,opbuf ld c,setdma call bdos call setout ld de,opfcb ld c,fwrite call bdos ret
Another possible approach is to remove (or keep optional) those compression methods which are not used anymore.
I've merged your #12 changes and bumped the CP/M UNZIP version to v1.5-7. Thanks.
This one is a little bit extreme, but in my test case reduces the timing count (z88dk-ticks) from 28210377 to 26158350. It can be extended to the whole "getbits" logic and probably helps in saving a little more if correctly implemented. The only problem is that we must trust the BIOS (and BDOS) not be touching the alternate registers set, which should be the case if it is well written. Otherwise we should preserve HL' before using the BDOS calls.
;
nextsymbol:
ld (treep),hl
exx
ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
exx
nsloop:
; push hl
exx
;ld hl,(bitbuf) ; keep bitbuf in L, bleft in H
dec h
jp p,$+9 ; jump to "xor a", past jp op plus 6 bytes:
call getbyte ; (3 bytes)
ld l,a ; (1 byte) new bitbuf
ld h,7 ; (2 bytes) 8 bits left, pre-dec'd
xor a ; jp op above jumps here
rr l
; ld (bitbuf),hl ; update bitbuf/bleft
exx
;ld h,a ; A still zero
rla ; return bit in HL and A
;ld l,a
; pop hl
or a
jr z,nsleft
inc hl
inc hl
nsleft:
ld e,(hl)
inc hl
ld d,(hl)
ld a,d
cp 10h
jr nc,nsleaf
or e
;ret z
jr z,nsexit
ld hl,(treep)
add hl,de
add hl,de
add hl,de
add hl,de
jr nsloop
nsleaf: and 0fh
ld d,a
nsexit:
exx
ld (bitbuf),hl ; keep bitbuf in L, bleft in H
exx
ret
One step further...
; rd1bit
; push af
;
; ld a,2
; call rdbybits
; or a
ld a,3 ; better to gather 3 bits at once, it's faster and smaller
call rdbybits
ld l,a
and 1 ; keep the first bit
push af
ld a,l ; now onto the next 2 bits
srl a
; push de ; ld de,-4 ; add iy,de ; pop de
; push de ; ld de,4 ; add iy,de ; pop de
...why it was preferred the addition ? For the flags?