Closed patricksurry closed 2 months ago
hold off on this one for a bit, i'm going to split it into two parts since I have some more literal stuff
No problem - just let me know when you'd like me to take a look at it.
Example showing all inline (excl sliteral) vs all subroutine. Disassembler distinguishes 2literal from sliteral and 1...4 stack check.
257 nc-limit ! ok
: bar s" hello" 12345678. ; ok
see bar
...
size (decimal): 33
...
80B 813 jmp
813 A189 jsr SLITERAL 80E 5
81A 61 ldy.#
81C 4E lda.#
81E dex
81F dex
820 0 sta.zx
822 1 sty.zx
824 BC lda.#
826 dex
827 dex
828 0 sta.zx
82A 1 stz.zx
ok
: fib 0 1 rot 0 ?do over + swap loop drop ; ok
see fib
...
size (decimal): 159
...
838 dex
839 dex
83A 0 stz.zx
83C 1 stz.zx
83E dex
83F dex
840 1 lda.#
842 0 sta.zx
844 1 stz.zx
846 D7A7 jsr 3 STACK DEPTH CHECK
849 5 ldy.zx
84B 3 lda.zx
84D 5 sta.zx
84F 1 lda.zx
851 3 sta.zx
853 1 sty.zx
855 4 ldy.zx
857 2 lda.zx
859 4 sta.zx
85B 0 lda.zx
85D 2 sta.zx
85F 0 sty.zx
861 dex
862 dex
863 0 stz.zx
865 1 stz.zx
867 0 lda.zx
869 2 cmp.zx
86B D bne 87A v
86D 1 lda.zx
86F 3 cmp.zx
871 7 bne 87A v
873 inx
874 inx
875 inx
876 inx
877 8D2 jmp
87A 85BB jsr DO
87D D7A2 jsr 2 STACK DEPTH CHECK
880 dex
881 dex
882 4 lda.zx
884 0 sta.zx
886 5 lda.zx
888 1 sta.zx
88A D7A2 jsr 2 STACK DEPTH CHECK
88D clc
88E 0 lda.zx
890 2 adc.zx
892 2 sta.zx
894 1 lda.zx
896 3 adc.zx
898 3 sta.zx
89A inx
89B inx
89C D7A2 jsr 2 STACK DEPTH CHECK
89F 0 lda.zx
8A1 2 ldy.zx
8A3 2 sta.zx
8A5 0 sty.zx
8A7 1 lda.zx
8A9 3 ldy.zx
8AB 3 sta.zx
8AD 1 sty.zx
8AF 20 inc.z
8B1 D bne 8C0 v
8B3 1F ldy.z
8B5 101 lda.y
8B8 inc.a
8B9 80 cmp.#
8BB 6 beq 8C3 v
8BD 101 sta.y
8C0 87D jmp
8C3 1F ldy.z
8C5 dey
8C6 dey
8C7 dey
8C8 dey
8C9 1F sty.z
8CB 5 bmi 8D2 v
8CD 100 lda.y
8D0 20 sta.z
8D2 D79D jsr 1 STACK DEPTH CHECK
8D5 inx
8D6 inx
ok
0 nc-limit ! ok
: bar s" hello" 12345678. ; redefined bar ok
see bar
...
size (decimal): 22
...
8E3 8EB jmp
8EB A189 jsr SLITERAL 8E6 5
8F2 A189 jsr 2LITERAL BC614E
ok
: fib 0 1 rot 0 ?do over + swap loop drop ; redefined fib ok
see fib
...
size (decimal): 40
...
905 9E04 jsr 0
908 9D9E jsr 1
90B 8F83 jsr rot
90E 9E04 jsr 0
911 85A3 jsr ?DO 92A
916 85BB jsr DO
919 8CFD jsr over
91C 8E4B jsr +
91F 9242 jsr swap
922 8ACB jsr LOOP 919
927 95D6 jsr unloop
92A 8699 jsr drop
ok
This should be good to take a look. Saves more space with some speed improvements, and more flexibility between native and subroutine compile.
It can native compile all conditional branching / jumps without any jsr + two byte payload which is much faster.
Setting nc-limit
to zero now generates (almost?) pure subroutine threading with no native code. This saves more space in that mode, and opens the possibility of a 'token threaded' mode where a sequence of three-byte JSR xxyy could be replaced by (mostly) single byte tokens. I've mocked something for that, hopefully ready to share in a week or so.
I simplified a few things preferring simpler/smaller over faster for compile-time words, and split out
compile.asm
containingcompile,
along with supporting routines fromtaliforth.asm
.Notable changes:
jsr + <target>
branches): foo 2>r ;
will native compile with the defaultnc-limit
).