Highish-level syntax for dependency-injected abort

akkartik commented 4 years ago

SubX currently allows one to test the exit() syscall. It does so using a dependency-injected wrapper called stop that takes an exit-descriptor as an argument. If the exit-descriptor is null the program really exits. If it is created using tailor-exit-descriptor stop unwinds the stack until the frame that called tailor-exit-descriptor.

But tailor-exit-descriptor is klunky. The way it currently works is, you pass in the number of args of the function that's going to get passed in the exit-descriptor, and it computes where on the stack the return address for the current stack frame is going to be, saving it to the exit-descriptor. That way all stop has to do is set ESP to the return address in the exit-descriptor and then call c3/return.

If we moved to a more HLL syntax where function calls were all in a single line, we'd like to make tailor-exit-descriptor cleaner or maybe do away with it altogether.

One easier way to explain tailor-exit-descriptor is that it is equivalent to a special function call. Now for background, regular calls in SubX look like this:

push all args
call
discard args (say by adding a constant to ESP)

The special call to a function that may want to call stop looks like this:

push all the args (which must include the address to the exit-descriptor)
save ESP-4 to the exit-descriptor
call
discard args

Only the second step is new. But we now get to replace all of tailor-exit-descriptor with a single instruction (assuming the address of the exit-descriptor is always in a register).

Now my question becomes: what does a syntax for this special call look like?

If regular calls look like f(arg1, arg2, ... argn), then some possibilities:

1.

ed = tailor-exit-descriptor(...total size of args to f...)
f(arg1, arg2, ...ed, ...argn)

2.

w/exit-descriptor(ed) f(arg1, arg2, ...ed, ...argn)

I don't like either, but they do have some benefits. The first allows for a single exit-descriptor to be reused across function calls in the same stack frame (totally safe). The second is all on one line to indicate that conceptually it's all a single call.

On the other hand, the second now looks like two operations on a single line, which is confusing and potentially sets a bad precedent. So I wonder what sort of approach we may take that makes it look less like two calls in a single line.

Baroque:

w/exit-descriptor{|ed| f(arg1, arg2, ...ed, ...argn) }

That's a lot of grammar for just one construct.

4.

try f(arg1, arg2, ...ed, ...argn)

We'd need to somehow figure out where ed is.

5.

protect[ed] f(arg1, arg2, ...ed, ...argn)

6.

f<ed>(arg1, arg2, ...ed, ...argn)

Some other random ideas: perhaps we should try to generalize the syntax with any other operations that involve munging the stack. Maybe closures or more general exceptions or first-class continuations.

Anyways, that's my brain dump.

cc @charles-l

akkartik commented 4 years ago

Ok, I have a broader proposal: create syntax not for function calls but for stack management.

Stack management is a crucial part of the book-keeping involved in Assembly programming, and it would be great if explicit push instructions become code smells if not utterly disallowed.

Currently we use push for three kinds of things:

a) Defining local variables (which we then must remember to clean up before c3/return, because otherwise we lose our return address. b) Calling functions with arguments (which we must then remember to clean up after the callee returns) c) Spilling registers to be reused later.

Here's an example syntax to support all 3: create a rudimentary stack-based language for lines beginning with some special token, say {. Such lines can have two kinds of expressions:

push something (rm32, imm32) to the stack
save the value of ESP to some rm32

Later a line with a } would restore the stack to the same level as before the corresponding {.

For example:

{ 0 0 ->%ecx
...
}

This is equivalent to:

68/push 0/imm32
68/push 0/imm32
89/copy %ecx 4/ESP/r32
...
81 0/subop/add %esp, 8

Which is basically what you need to define a local variable (say a slice).

A function call:

{ %ecx "foo" %edx
e8/call foo/disp32
}

which is equivalent to the pseudocode:

push %ecx
push "foo"/imm32
push %edx
e8/call foo/disp32
81 0/subop/add %esp, 8

akkartik commented 4 years ago

Hmm, this syntax is interesting, but it makes the original tailor-exit-descriptor scenario pretty terse and awfully hard to spot. For example, assuming the address to the exit descriptor is currently in ECX:

# call f(x, y, ed, z) that may call stop(ed) at some point
{ z ed y x ->*ed  # last word tailors 'ed'
e8/call f/disp32
}

The only difference between a local variable and tailoring is that % turns into *.

charles-l commented 4 years ago

create a rudimentary stack-based language for lines beginning with some special token, say {

I really like this idea. Keeping the stack balanced is error prone, and this gives us more control over function calls than the normal function call syntax (i.e. whether we use disp32/disp8).

In terms of syntax, I think being able to reference the stack frame/scope by name would be handy:

{|stack1| 1 2 3 {|stack2| 3 *(stack1+4) 5   call blah}}
{|stack1| 1 (stack1+12) call some-func-that-uses-ed}
{|stack1| 1 (stack1 + stack1.retaddr) call some-function-that-uses-ed} # if we calculate the return address for every stack frame, just pass the return addr

I think the stack location can then be computed from the lexical location in the file (i.e. we know how many words are on the stack, so we can determine statically what the offset is to the address).

akkartik commented 4 years ago

That is interesting, but where would these stack1 variables be stored? This may be harder than it seems at first glance.

I like how you've put the entire {...} on a single line. If it's not too hard I'd like to provide that single-line alternative. But we still need to support multiple lines between the {...}.

charles-l commented 4 years ago

I was imagining something like this:

{|stack1| 1 2 3 {|a-call| %ebx *(stack1-4) e8/call somefunc}

=>
# Stack #
| 0x01          | <- stack1 is a pointer to this
| 0x02          |
| 0x03          | 
| %ebx val      | <- a-call points here
| *((ebp+32)-4) | # we know this is ebp-32 since there are only 4 words on the stack at this point in the program

Essentially it's just a label for the stack, which I think works(?).

akkartik commented 4 years ago

Yeah, mostly makes sense. The question in my mind is: where is stack1 allocated? Is it a purely translation-time variable? Your first example also seemed to use it in complex ways like stack1 + stack1.retaddr.

charles-l commented 4 years ago

Is it a purely translation-time variable?

Yeah. That's what I was thinking.

stack1.size or stack1.len is probably a better name. It just becomes the length of the stack (which we can be evaluated at compile time). This of course doesn't work with vararg functions and dynamic stack manipulation, but I feel like those are less common cases.

If a dynamic stack is required, I think something like this would work:


{|stack1| %esp 1 2 3 *dynamic pushes* ...
# length is required now
%esp - (*stack1) # calculate dynamically by subtracting the old esp with the current esp
}```

akkartik commented 4 years ago

Ok, I see.

It looks like your examples have expressions on multiple lines? Maybe this is a completely new language rather than just sugar?

akkartik commented 4 years ago

I'm starting to grow less excited about this whole thread. For multiple reasons:

a) The new stack syntax adds a new gotcha to compensate for the gotcha it protects us from. You have to make sure you never exit except through the }. Otherwise the stack gets mismatched.

b) It seems to increase the reader's burden to have an additional 'language' that code in the repo may be written in. The alternative would be to treat the new syntax sugar as part of core SubX, support it in the C++ version, rewrite all our SubX code to use it, and treat any new phases as part of the core. That seems like a lot of work for unclear benefit, since the amount of progress we've made is a sort of existence proof that maybe SubX without the extra sugar isn't so bad after all.

c) Rather than attack gotchas one by one, we should just start on a new language. A memory-safe statement-oriented language implemented in SubX where each statement maps to a single x86 instruction.

In other words, I'd rather this be the next syntax:

var x : slice
...

than this:

{ 0 0 ->%ecx
...
}

charles-l commented 4 years ago

Yeah, I was looking at the syntax I came up with the other day and I realized that it attempts to solve two problems: memory labeling (poorly) and stack balancing.

With the new approach are you thinking it’ll do stack balancing automatically since it should be memory safe?

akkartik commented 4 years ago

You know, that's a good question. It's my top priority, and I think I'll have to violate my "1 instruction per line" design constraint to achieve it.

But yes, that's the plan.

akkartik commented 4 years ago

Today, though, I'm enamored with the idea of a tiny Lisp interpreter. It's not going to be the final goal, but it would just be so cool to be able to type commands at init. And should be fairly quick.. We're due for some fun.

Any fun little projects you want to try?

charles-l commented 4 years ago

Today, though, I'm enamored with the idea of a tiny Lisp interpreter. It's not going to be the final goal, but it would just be so cool to be able to type commands at init. And should be fairly quick.. We're due for some fun.

That definitely could be fun. I've not implemented lisp in asm before (though I guess it has been done before which might be a handy reference).

I've been interested in Forth recently (particularly because of how simple it is to implement in asm, and because it allows mixing asm code with interpreted code). I started implementing a Forth in nasm a few months back (https://git.sr.ht/~nch/onward/tree/master/onward.s), but never quite finished it. Now might be the time for me to port it to subx and finish it off :)

akkartik commented 4 years ago

Excellent idea!

akkartik / mu

Highish-level syntax for dependency-injected abort #36