akkartik / mu

Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.
http://akkartik.name/akkartik-convivial-20200607.pdf
Other
1.35k stars 47 forks source link

Desugaring pass #35

Closed charles-l closed 4 years ago

charles-l commented 4 years ago

Hey Kartik -- just started writing a desugaring pass for registers. I'm planning to implement:

%reg => 3/mod/direct reg/rm32
*reg => 0/mod/indirect reg/rm32

Not sure about the other reg/mem operations, so I'll just start with these.

I might also start adding function calls if I make good progress with these.

charles-l commented 4 years ago

FYI right now registers are translated using a switch, but I'm planning to move it to a table lookup next.

akkartik commented 4 years ago

Excellent! Yeah, let's do just registers in this PR, and start a fresh one for calls. survey.subx took forever to merge. I want to get better at that sort of thing.

akkartik commented 4 years ago

I made a pretty major change yesterday, promoting SubX to the top-level directory. I've merged that into this branch, so hopefully everything will work once you git pull. You'll just need to get used to staying in the top-level directory :/

akkartik commented 4 years ago

By the way, I've been assuming you'd prefer to work on this by yourself. But let me know if you'd like me to help out!

charles-l commented 4 years ago

By the way, I've been assuming you'd prefer to work on this by yourself. But let me know if you'd like me to help out!

Absolutely -- if you want to push it forward, feel free. Right now the tests aren't great and I really do want to move it to a table lookup rather than a switch so we can expend it more easily and not bloat the binary.

akkartik commented 4 years ago

Some issues that are probably on your radar, but I figure I'll write them out anyway:

  1. We need to move the translation from:

%eax => 0

to:

%eax => 3/mod 0/rm32

  1. convert currently only converts the first word in a line.

Let me start on 2. Just requires copying over some boilerplate from other phases.

akkartik commented 4 years ago

Both issues I pointed out are addressed.

I've continued duplicating the string literals, and they're even longer now.

akkartik commented 4 years ago

There were a few wrong turns, but I was quite happy with how that last commit turned out. We seem to have a core basis set of primitives in SubX now, and features are getting easier to knock out.

Thanks for the suggestion of switching to a table-based lookup! I'd originally thought a switch wasn't too bad. But eventually I realized that things would get even more complex for the future addressing modes.

charles-l commented 4 years ago

The only problem with the table is the addressing modes with arguments. I was thinking of having the table store the arity and have a dummy word that gets replaced ($1, $2, etc) like a macro system in most assemblers.

akkartik commented 4 years ago

Hmm, I'm not following. This particular table would apply just for the reg elements in expressions of the form *(reg+disp) or *(reg+reg<<scale+disp). We'd still need to pattern match the expressions themselves. There don't seem to be enough patterns to be worth making that data-driven. Am I missing something? Could you show an example of the format you were planning to create the table with?

akkartik commented 4 years ago

Oh were you trying to put the % vs * into the table?

charles-l commented 4 years ago

Yeah, that probably makes more sense. I was thinking of doing a string replace i.e. ($1+$2) => $1/rm32 $2/disp8, but it'd be better to just do a lookup for the register, and a switch depending on the displacement size... not sure how we'll determine that when we don't have the label distance yet though...

akkartik commented 4 years ago

Oh yes, I was planning on always using /disp32 (and therefore 2/mod). That is indeed sub-optimal.

akkartik commented 4 years ago

It occurs to me that this pass needs good error handling. We've punted on that so far because I've rationalized that people use the C++ translator during development and then run the self-hosted translator at the end, when we know there are no errors.

But this pass has no other version. The buck stops here.

We need a decent error message when someone typos a register name. And we should probably support uppercase. I think I need to switch the get helpers to use exit descriptors and not just hard abort. That way we can write tests for the error messages.

akkartik commented 4 years ago

I'm still thinking about error-handling. But no reason to block on it. Let's just keep going as before for now.

akkartik commented 4 years ago

Here's what the error messages look like now:

$ echo %ecx |subx run apps/desugar
3/mod/direct 0x00000001/rm32
$ echo %ecy |subx run apps/desugar
Registers: get-slice: key not found: ecy

Not perfect, but fairly helpful. Good enough for now!

akkartik commented 4 years ago

I just sketched out how we might go about implementing indirect mode beyond *reg. It's kinda interesting because it's a tiny little compiler in microcosm with a source language that fits in a single line, a parser, an intermediate representation and a code-generator. Let me know what you think: https://github.com/akkartik/mu/pull/35/commits/67dac668e3b0aa09efc38e8646c4d4233b764488

akkartik commented 4 years ago

Hi @charles-l, I've set up the master branch to always desugar on native builds. So if you want to start building a Forth using the new syntax sugar, all you have to do to build it is:

$ ./ntranslate 0*.subx apps/forth.subx  # generates a.elf

..., etc.

akkartik commented 4 years ago

Here's what apps/factorial.subx could look like now:

== code
Entry:  # run tests if necessary, compute `factorial(5)` if not
    # initialize heap
    # . Heap = new-segment(64KB)
    # . . push args
    68/push  Heap/imm32
    68/push  0x10000/imm32/64KB
    # . . call
    e8/call  new-segment/disp32
    # . . discard args
    81 0/subop/add %esp 8/imm32

    # . prolog
    89/<- %ebp 4/r32/ESP
    # - if argc > 1 and argv[1] == "test", then return run_tests()
    # . argc > 1
    81          7/subop/compare     1/mod/*+disp8   5/rm32/EBP    .           .             .           .           0/disp8         1/imm32           # compare *EBP
    7e/jump-if-lesser-or-equal  $run-main/disp8
    # . argv[1] == "test"
    # . . push args
    68/push  "test"/imm32
    ff 6/subop/push  *(ebp+8)
    # . . call
    e8/call  kernel-string-equal?/disp32
    # . . discard args
    81 0/subop/add %esp 8/imm32
    # . check result
    3d/compare-EAX-and  1/imm32
    75/jump-if-not-equal  $run-main/disp8
    # . run-tests()
    e8/call  run-tests/disp32
    8b/copy                         0/mod/indirect  5/rm32/.disp32            .             .           0/r32/EAX   Num-test-failures/disp32          # copy *Num-test-failures to EAX
    eb/jump  $main:end/disp8
$run-main:
    # - otherwise return factorial(5)
    # . . push args
    68/push  5/imm32
    # . . call
    e8/call  factorial/disp32
    # . . discard args
    81 0/subop/add %esp 4/imm32
$main:end:
    # syscall(exit, EAX)
    89/<- %ebx 0/r32/EAX
    b8/copy-to-EAX  1/imm32/exit
    cd/syscall  0x80/imm8

factorial:  # n : int -> int/EAX
    # . prolog
    55/push-EBP
    89/<- %ebp 4/r32/ESP
    53/push-EBX
    # EAX = 1 (base case)
    b8/copy-to-EAX  1/imm32
    # if (n <= 1) return
    81 7/subop/compare *(ebp+8) 1/imm32
    7e/jump-if-<=  $factorial:end/disp8
    # EBX = n-1
    8b/-> *(ebp+8) 3/r32/EBX
    81 5/subop/subtract %ebx 1/imm32
    # EAX = factorial(n-1)
    # . . push args
    53/push-EBX
    # . . call
    e8/call  factorial/disp32
    # . . discard args
    81 0/subop/add %esp 4/imm32
    # return n * factorial(n-1)
    f7 4/subop/multiply-into-EAX *(ebp+8)
    # TODO: check for overflow
$factorial:end:
    # . epilog
    5b/pop-to-EBX
    89/<- %esp 5/r32/EBP
    5d/pop-to-EBP
    c3/return

test-factorial:
    # factorial(5)
    # . . push args
    68/push  5/imm32
    # . . call
    e8/call  factorial/disp32
    # . . discard args
    81 0/subop/add %esp 4/imm32
    # check-ints-equal(EAX, 120, msg)
    # . . push args
    68/push  "F - test-factorial"/imm32
    68/push  0x78/imm32/expected-120
    50/push-EAX
    # . . call
    e8/call  check-ints-equal/disp32
    # . . discard args
    81 0/subop/add %esp 0xc/imm32
    # end
    c3/return

There are still 2 corner cases where we can't use the syntax sugar:

  1. *ebp needs to expand to 1/mod 5/rm32 0/disp8 rather than 0/mod 5/rm32.
  2. We need to support globals like *Num-test-failures. They should expand to 0/mod 5/rm32 Num-test-failures/disp32.

(I'm going to hold off on updating factorial.subx because then emulated runs would stop working. Not sure how to deal with that.)

akkartik commented 4 years ago

One issue with using this new syntax: apps/desugar is slow in emulated mode.

charles-l commented 4 years ago

Looks good! Excited to see some sugar starting to get added to the core language :)

akkartik commented 4 years ago

esp and ebp now get correctly (if sub-optimally) desugared. I renamed desugar.subx to sigils.subx since we're going to be writing other desugaring passes next.

*disp32 mode is still not supported.

akkartik commented 4 years ago

Ok, *disp32 is done. Sigils is now complete.

Next few items on my todo list:

akkartik commented 4 years ago

Parsing function calls is slightly tricky.

One thought I had was to switch to Lisp-style function calls instead. That way calls are easy to detect: the first non-whitespace on a line is (, and there are no pesky commas to deal with. What do you think? This is nowhere near real s-expressions, of course. Zero nesting, and a line can have parens at most 2 deep (one for a call, one for a *(...) expression).

akkartik commented 4 years ago

Feels weird to use parens for two conflicting purposes. In Lisp they are never used for grouping, and in other languages they don't indicate function calls.

And yet they're required for calls in C, and Lisp special forms do have lists with parens that aren't calls.