Closed patricksurry closed 4 months ago
One nice twist: if you tag allow-native
after each word in the nesting
benchmark, it drops from 25165929 cycles to 0 since it happily inlines away all the empty words 🎉
Is the current never-native
default for user words just for safety? Perhaps it should switch to allow-native
?
The current never-native
status on new words is indeed for safety because we currently don't have a way to tell if a word contains a JMP or not. If there it a JMP, then native compiling is dangerous because it will jump back into the original word (and return at the end of that word, completely skipping the stuff that was compiled AFTER that word in the newly compiled word). If the user did not use any flow control or JMP instructions, then they can apply allow-native
or always-native
as they see fit. This is described in the "Native Compiling" section of the manual (search for allow-native
and you'll find it pretty quickly).
If you can devise a way where we can absolutely determine that a word doesn't contain JMP (or mess with the return address, which is another case where it has to be JSRd to), then it would theoretically be possible to flag a new word allow-native
(which is what you get if neither AN (always native) or NN (never native) flags are set) automatically.
oh, so this would cover cases where words contained literal assembly code, like cycles
in the tests?
i was thinking of typical words written purely in forth (without knowledge that it's a 6502 underneath) - presumably those would all be safe? i was assuming this would cover most normal use: the former seems like an advanced use case where the user might be expected to know they should tag as never-native
?
Words written in Forth are not safe either, which is why Never-Native is the default. If they have flow control that uses JMP (such as IF/ELSE/THEN or a loop), then they can't be natively compiled. Here is an example:
: aword dup if . else drop then ;
allow-native
100 nc-limit !
: bword 5 aword ." Never makes it here" ;
bword ( should print "5 Never makes it here" but it does not )
Try see aword
and see bword
and see if you can spot the problem.
💡
I do like having this benchmark option, especially with the ability on GitHub to go back and fetch an earlier binary. You've made a ton of great progress on speeding up Tali during general use.
If it's not going to be run with the regular test suite, I'd recommend putting a comment at the very top showing how to run it. You could also add the above benchmarks you've done (with dates from the binaries used) so there is something to reference against if someone runs it in the future. Once I merge this, the documentation you have here will be hidden amongst the many merged pull requests.
Good call. I added some instructions and started a results log.
indicates master along the bugfixes here which were breaking a couple of the tests. Both bugs (mismatching UF between header and body, and skipping the underflow optimization if the word also had stack juggling) also have fixes in other branches but I'll just update as you merge things.I just merged a bunch of PRs - I think you merged master into this PR somewhere in the middle of my merge-o-thon. Let me know once you like this one for pulling.
A key for the table might be nice - it took me a minute or two of looking at it to realize SUF was Strip UnderFlow (my mind was stuck on "suffix" for some reason). I was able to sus it out after looking at the commands at the top.
Yup, this looks good to go now - updated the comments plus the latest results since all the fixes here were picked up in other branches.
I didn't add this to the standard test suite since it takes a few seconds to run, but you can run manually like
Here's a quick comparison to a Nov'23 binary with and without strip-underflow set to true. (nc-limit is the default 20)
tl;dr the loop speedups and other optimizations gained about 25%, the underflow improvements add another 10%.