Closed charles-l closed 4 years ago
FYI right now registers are translated using a switch, but I'm planning to move it to a table lookup next.
Excellent! Yeah, let's do just registers in this PR, and start a fresh one for calls. survey.subx
took forever to merge. I want to get better at that sort of thing.
I made a pretty major change yesterday, promoting SubX to the top-level directory. I've merged that into this branch, so hopefully everything will work once you git pull
. You'll just need to get used to staying in the top-level directory :/
By the way, I've been assuming you'd prefer to work on this by yourself. But let me know if you'd like me to help out!
By the way, I've been assuming you'd prefer to work on this by yourself. But let me know if you'd like me to help out!
Absolutely -- if you want to push it forward, feel free. Right now the tests aren't great and I really do want to move it to a table lookup rather than a switch so we can expend it more easily and not bloat the binary.
Some issues that are probably on your radar, but I figure I'll write them out anyway:
%eax
=> 0
to:
%eax
=> 3/mod 0/rm32
convert
currently only converts the first word in a line.Let me start on 2. Just requires copying over some boilerplate from other phases.
Both issues I pointed out are addressed.
I've continued duplicating the string literals, and they're even longer now.
There were a few wrong turns, but I was quite happy with how that last commit turned out. We seem to have a core basis set of primitives in SubX now, and features are getting easier to knock out.
Thanks for the suggestion of switching to a table-based lookup! I'd originally thought a switch wasn't too bad. But eventually I realized that things would get even more complex for the future addressing modes.
The only problem with the table is the addressing modes with arguments. I was thinking of having the table store the arity and have a dummy word that gets replaced ($1, $2, etc) like a macro system in most assemblers.
Hmm, I'm not following. This particular table would apply just for the reg
elements in expressions of the form *(reg+disp)
or *(reg+reg<<scale+disp)
. We'd still need to pattern match the expressions themselves. There don't seem to be enough patterns to be worth making that data-driven. Am I missing something? Could you show an example of the format you were planning to create the table with?
Oh were you trying to put the %
vs *
into the table?
Yeah, that probably makes more sense. I was thinking of doing a string replace i.e. ($1+$2) => $1/rm32 $2/disp8
, but it'd be better to just do a lookup for the register, and a switch depending on the displacement size... not sure how we'll determine that when we don't have the label distance yet though...
Oh yes, I was planning on always using /disp32
(and therefore 2/mod
). That is indeed sub-optimal.
It occurs to me that this pass needs good error handling. We've punted on that so far because I've rationalized that people use the C++ translator during development and then run the self-hosted translator at the end, when we know there are no errors.
But this pass has no other version. The buck stops here.
We need a decent error message when someone typos a register name. And we should probably support uppercase. I think I need to switch the get
helpers to use exit descriptors and not just hard abort. That way we can write tests for the error messages.
I'm still thinking about error-handling. But no reason to block on it. Let's just keep going as before for now.
Here's what the error messages look like now:
$ echo %ecx |subx run apps/desugar
3/mod/direct 0x00000001/rm32
$ echo %ecy |subx run apps/desugar
Registers: get-slice: key not found: ecy
Not perfect, but fairly helpful. Good enough for now!
I just sketched out how we might go about implementing indirect mode beyond *reg
. It's kinda interesting because it's a tiny little compiler in microcosm with a source language that fits in a single line, a parser, an intermediate representation and a code-generator. Let me know what you think: https://github.com/akkartik/mu/pull/35/commits/67dac668e3b0aa09efc38e8646c4d4233b764488
Hi @charles-l, I've set up the master
branch to always desugar on native builds. So if you want to start building a Forth using the new syntax sugar, all you have to do to build it is:
$ ./ntranslate 0*.subx apps/forth.subx # generates a.elf
..., etc.
Here's what apps/factorial.subx could look like now:
== code
Entry: # run tests if necessary, compute `factorial(5)` if not
# initialize heap
# . Heap = new-segment(64KB)
# . . push args
68/push Heap/imm32
68/push 0x10000/imm32/64KB
# . . call
e8/call new-segment/disp32
# . . discard args
81 0/subop/add %esp 8/imm32
# . prolog
89/<- %ebp 4/r32/ESP
# - if argc > 1 and argv[1] == "test", then return run_tests()
# . argc > 1
81 7/subop/compare 1/mod/*+disp8 5/rm32/EBP . . . . 0/disp8 1/imm32 # compare *EBP
7e/jump-if-lesser-or-equal $run-main/disp8
# . argv[1] == "test"
# . . push args
68/push "test"/imm32
ff 6/subop/push *(ebp+8)
# . . call
e8/call kernel-string-equal?/disp32
# . . discard args
81 0/subop/add %esp 8/imm32
# . check result
3d/compare-EAX-and 1/imm32
75/jump-if-not-equal $run-main/disp8
# . run-tests()
e8/call run-tests/disp32
8b/copy 0/mod/indirect 5/rm32/.disp32 . . 0/r32/EAX Num-test-failures/disp32 # copy *Num-test-failures to EAX
eb/jump $main:end/disp8
$run-main:
# - otherwise return factorial(5)
# . . push args
68/push 5/imm32
# . . call
e8/call factorial/disp32
# . . discard args
81 0/subop/add %esp 4/imm32
$main:end:
# syscall(exit, EAX)
89/<- %ebx 0/r32/EAX
b8/copy-to-EAX 1/imm32/exit
cd/syscall 0x80/imm8
factorial: # n : int -> int/EAX
# . prolog
55/push-EBP
89/<- %ebp 4/r32/ESP
53/push-EBX
# EAX = 1 (base case)
b8/copy-to-EAX 1/imm32
# if (n <= 1) return
81 7/subop/compare *(ebp+8) 1/imm32
7e/jump-if-<= $factorial:end/disp8
# EBX = n-1
8b/-> *(ebp+8) 3/r32/EBX
81 5/subop/subtract %ebx 1/imm32
# EAX = factorial(n-1)
# . . push args
53/push-EBX
# . . call
e8/call factorial/disp32
# . . discard args
81 0/subop/add %esp 4/imm32
# return n * factorial(n-1)
f7 4/subop/multiply-into-EAX *(ebp+8)
# TODO: check for overflow
$factorial:end:
# . epilog
5b/pop-to-EBX
89/<- %esp 5/r32/EBP
5d/pop-to-EBP
c3/return
test-factorial:
# factorial(5)
# . . push args
68/push 5/imm32
# . . call
e8/call factorial/disp32
# . . discard args
81 0/subop/add %esp 4/imm32
# check-ints-equal(EAX, 120, msg)
# . . push args
68/push "F - test-factorial"/imm32
68/push 0x78/imm32/expected-120
50/push-EAX
# . . call
e8/call check-ints-equal/disp32
# . . discard args
81 0/subop/add %esp 0xc/imm32
# end
c3/return
There are still 2 corner cases where we can't use the syntax sugar:
*ebp
needs to expand to 1/mod 5/rm32 0/disp8
rather than 0/mod 5/rm32
.*Num-test-failures
. They should expand to 0/mod 5/rm32 Num-test-failures/disp32
.(I'm going to hold off on updating factorial.subx because then emulated runs would stop working. Not sure how to deal with that.)
One issue with using this new syntax: apps/desugar
is slow in emulated mode.
Looks good! Excited to see some sugar starting to get added to the core language :)
esp
and ebp
now get correctly (if sub-optimally) desugared.
I renamed desugar.subx to sigils.subx since we're going to be writing other desugaring passes next.
*disp32 mode is still not supported.
Ok, *disp32
is done. Sigils is now complete.
Next few items on my todo list:
Parsing function calls is slightly tricky.
One thought I had was to switch to Lisp-style function calls instead. That way calls are easy to detect: the first non-whitespace on a line is (
, and there are no pesky commas to deal with. What do you think? This is nowhere near real s-expressions, of course. Zero nesting, and a line can have parens at most 2 deep (one for a call, one for a *(...)
expression).
Feels weird to use parens for two conflicting purposes. In Lisp they are never used for grouping, and in other languages they don't indicate function calls.
And yet they're required for calls in C, and Lisp special forms do have lists with parens that aren't calls.
Hey Kartik -- just started writing a desugaring pass for registers. I'm planning to implement:
Not sure about the other reg/mem operations, so I'll just start with these.
I might also start adding function calls if I make good progress with these.