Closed patricksurry closed 8 months ago
This has a couple of small changes (faster 0branch
and native jmp
for unconditional branch, like loop
already does)
and a bigger one that does native branch optimization for short forward branches. You can see
both things in this after/before example.
The jsr BRANCH 81f
becomes jmp xxx
saving a couple of bytes and lots of cycles,
and the jsr 0BRANCH 81a
becomes jsr Z / beq +8
where Z is a tiny runtime that sets the zero flag from TOS
saving a bunch of cycles on the conditional branch.
I routed other users of 0BRANCH like else and while through xt_then to benefit from same optimization. This makes conditionals faster, especially compact ones.
The other candidate for branch optimization is UNTIL
which needs a backward branch. If you like this idea I'm happy to add that.
0 nc-limit ! : foo if 3 else 4 then ; see foo
nt: 81E xt: 829
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 18
0829 20 01 92 F0 08 20 80 93 03 00 4C 3B 08 20 80 93 .... .. ..L;. ..
0839 04 00 ..
829 9201 jsr Z
82C 8 beq 836 v
82E 9380 jsr LITERAL 3
833 83B jmp
836 9380 jsr LITERAL 4
which previously looked like:
nt: 800 xt: 80B
flags (CO AN IM NN UF HC): 0 0 0 1 0 1
size (decimal): 20
080B 20 1B 92 1A 08 20 9F 93 03 00 20 E4 8D 1F 08 20 .... .. .. ....
081B 9F 93 04 00 ....
80B 921B jsr 0BRANCH 81A
810 939F jsr LITERAL 3
815 8DE4 jsr BRANCH 81F
81A 939F jsr LITERAL 4
ok
Did some cleanup and added xt_until support. This adds +80 bytes to the ROM and compilation will be slightly slower but should have nice runtime benefit in cycles and small size saving for indefinite loops and conditionals
I'm generally fine with ROM being a bit larger and compilation time being longer for a savings in cycles and/or bytes in the generated code.
I made that simplification you suggested, and also added a few new cycle tests to measure conditionals which show a nice boost:
Before:
: doifword 0 100 0 do i 2 and if 1 else -1 then + loop drop ; ok
' doifword cycle_test CYCLES: 31080 ok
: buword 100 begin 1- ?dup 0= until ; ok
' buword cycle_test CYCLES: 19752 ok
: bwword 100 begin 1- ?dup while repeat ; ok
' bwword cycle_test CYCLES: 15074 ok
after
: doifword 0 100 0 do i 2 and if 1 else -1 then + loop drop ; ok
' doifword cycle_test CYCLES: 26180 ok
: buword 100 begin 1- ?dup 0= until ; ok
' buword cycle_test CYCLES: 16463 ok
: bwword 100 begin 1- ?dup while repeat ; ok
' bwword cycle_test CYCLES: 11663 ok
sorry about the random walk here... I think it's in a good state now but i'll have a look at the cycle test and disasm comments you noted.
Let me know when you want this PR merged.
i'll clean up the couple things you mentioned and then it should be good.
i assume there's no easy way to avoid the hard-coded tmp1
address in the ed
test? that tripped me up for a while after its location shifted 2 bytes. (Also the trailing spaces are apparently important in ed.fs but vi
was eventually my friend :)
ok this should be good to go now
I noticed
0branch
did some extra gymnastics torts
instead ofjmp ()
likebranch
. also looking at using native branch for short ones