Stack Inspection - Githubissues

RossTate commented 4 years ago

My impression was that there was interest in providing stack inspection, but that interest was reserved until having a better understanding of what it would entail. So this discussion is to explore the design space a bit more and understand what considerations need to be taken into account more thoroughly.

From my perspective, the key functionality to add is

answer $dispatch_tag instr1* within instr2* end
- where dispatch tag $dispatch_tag : [ti*] -> [to*]
- and instr1* : [ti*] -> [to*]
- which executes (and inherits the type of) instr2*, during which answering any stack search for $dispatch_tag with instr1*

where there is some way to look for these answers in the current stack, such as

call_stack $dispatch_tag : [ti*] -> [to*]
- where dispatch tag $dispatch_tag : [ti*] -> [to*]
- which looks up the stack for an answer $dispatch_tag and calls it

Applications we should consider here are stack tracing, linear-memory gc-root collection and updating, and two-phase exception handling. This also pertains to continuations and stack switching, but as those are less familiar concepts and unnecessary for these applications, I suggest not using them as examples, at least initially.

@rossberg My sense was that your main concern was that you had some unifying alternative in mind. Would you mind writing it up to inform the discussion?

Edit: Jump to here and the next two comments for a better implementation-focused description of answer prompted from some good questions from @aardappel.

KronicDeth commented 4 years ago

@RossTate the Lumen Core Team is very interested in getting stack switching as it would allow us to use stack switching for Erlang Process switching for both native and WASM and means we don't need code transformations in our MLIR and LLVM code for WASM. Can you expand on how stack switching would work with this?

RossTate commented 4 years ago

@KronicDeth So I don't want to get into detail on stack switching since we'll have a detailed proposal for it up once the question of whether to do stack inspection is resolved. Nonetheless I'll give you a high-level picture of how they relate.

With first-class stacks, a stack is composed of a bunch of sub-stacks that have been attached together. For various applications, it's useful to be able to detach a certain portion of the stack from the rest of the stack (e.g. to save it for later). The question is which attachment point do you want to attach (and where is it in the current stack). You generally answer this question by doing a stack inspection of the current stack to find the right point.

That said, if I've come to understand Erlang well enough, you probably aren't interested in that aspect of first-class stacks. You want the ability to switch directly between (complete) stacks without having to do an inspection. Our proposal will provide that functionality as well (though it took a while to figure out how to get that all to be safe and respect current standards in the presence of trapping and the like).

KronicDeth commented 4 years ago

You want the ability to switch directly between (complete) stacks without having to do an inspection

So, this complete vs partial stacks is actually a little bit of annoying "feature" we had in the native implementation for a bit - for stack traces we'd really like the scheduler to show up as in the stack before the Erlang process's stack, but that's not how it worked, so if the stack switching could just switch the part of the stack below a switch point, and then on error show a stacktrace of both the scheduler and Erlang process, that would be nice.

RossTate commented 4 years ago

You'll be given the freedom to pick whichever implementation you find works best. With stack inspection, the primitives would be expressive enough that you could use the faster stack-switching mechanism but still report the stack trace for both the current Erlang process and the scheduler.

RossTate commented 4 years ago

Back to the topic of stack inspection, let's talk about implementation.

If WebAssembly were actually an assembly language, there would be no notion of functions. Instead you would just have blocks of instructions that operate on some state. Some of that state could be a stack, and a block of instructions implementing a func would be given a pointer to that stack. That pointer would initially have a frame containing the func inputs (and return address), but then the instructions would extend the stack to add locals and so on. That stack-frame pointer would then be passed around between the various blocks of instructions implementing the func.

The instructions implementing an answer would like very much like the instructions implementing a func. The only difference is that the answer is given two stack-frame pointers: one for its own inputs and one for the stack frame of the func it's nested within.

The point is that an answer only looks scary because WebAssembly bundles stack-state and code-state. Nonetheless, the semantics and implementation are both straightforward.

Of course, there's good reason WebAssembly bundles stack-state and code-state. The heterogeneous mix of types and the stateful machine-dependent packing of local variables makes it hard to express a true-assembly approach efficiently on an abstract machine, and it makes it difficult for a host to independently verify that the true-assembly code is safe. Furthermore, the stack is an interlanguage resource used to facilitate interop between WebAssembly and its embedding language, e.g. JavaScript, and even between WebAssembly modules.

But those are also the reasons why it would be problematic for WebAssembly modules to implement stack inspection with a shadow stack. C++ is able to use a shadow stack because everything its stack (currently) needs to track lives in linear memory, solving the heterogeneity problem. But most languages are garbage collected, and we want to encourage them to use the host's GC, which means their local variables will be a mix of various reference types and numeric types. This means implementing a shadow stack will involve a lot of casting, dynamic allocation, and copying. (It'd be a sad day if it turns out that using the GC proposal is slower because of the significant overhead it incurs on implementing the shadow stack.) And even after all that effort, your shadow stack won't interop well with the embedding language or other modules, and for separate composition (to WebAssembly) you'll also have to expose your shadow stack so that other modules compiled from your language can share it, breaking encapsulation.

So answer is just a way to provide a relatively common device used by many runtimes, expressed within the existing abstraction WebAssembly is built upon, that's necessary for efficient and interoperable implementation for the same reason WebAssembly's abstraction was established in the first place.

aheejin commented 4 years ago

Sorry, but I don't really understand what answer and call_stack do. Can you elaborate, or provide a simple, not-too-long example?
Are you suggesting we redesign the EH proposal from scratch using these instructions?

RossTate commented 4 years ago

Are you suggesting we redesign the EH proposal from scratch using these instructions?

Definitely not. This is about stack inspection. Although stack inspection can be used to support the first phase of two-phase exception handling, the EH proposal is focused on single-phase exception handling (including unwinding, i.e. the second phase of two-phase exception handling), and I don't intend to change that focus (beyond WebAssembly/exception-handling#123 already looking into making it forwards compatible with two-phase exception handling).

Can you elaborate, or provide a simple, not-too-long example?

Sure. I'll try to get one written up shortly (though it's late here, so possibly not until tomorrow).

aheejin commented 4 years ago

Are you suggesting we redesign the EH proposal from scratch using these instructions?

Definitely not. This is about stack inspection. Although stack inspection can be used to support the first phase of two-phase exception handling, the EH proposal is focused on single-phase exception handling (including unwinding, i.e. the second phase of two-phase exception handling), and I don't intend to change that focus (beyond WebAssembly/exception-handling#123 already looking into making it forwards compatible with two-phase exception handling).

I'm confused. If we are to implement two-phase unwinding using stack inspection while keeping the current EH proposal focused on single-phase, why making it compatible with (or extensible to) to two-phase? I thought your intention was to add two-phase on top of the current EH proposal and to make the difference between the current and the future two-phase proposal smaller.

RossTate commented 4 years ago

I thought your intention was to add two-phase on top of the current EH proposal and to make the difference between the current and the future two-phase proposal smaller.

That is my intention. For two-phase EH, stack inspection would be used to determine (while preserving the stack) which handler to use, terminating the inspection via some instruction that effectively says "unwind the stack and then transfer control to target foo". That would then kick off stack unwinding, which would use the unwind blocks you suggested for the EH proposal in WebAssembly/exception-handling#123. Thus two-phase EH is a synthesis of stack inspection (for the first phase) and the current EH proposal (for the second phase) bridged by a to-be-determined "unwind the stack and then transfer control to target foo" instruction (where what a "target foo" looks like exactly is also to be determined).

aheejin commented 4 years ago

In the first phase of EH, throw (or rethrow) is the instruction that starts the search process. But you sound like we need these separate instructions to start the first phase, which makes me confused.

"unwind the stack and then transfer control to target foo" instruction

Also I'm not sure why we would need this kind of instruction. The transition from first to second phase can be done automatically when the first phase finds a handler, and that's how most (if not all) EH schemes are implemented I think.

RossTate commented 4 years ago

Here's some C# code:

bool flag = …;
try {
    …
} catch (Exception) when (flag = !flag) {
    println(“caught first”);
} catch (Exception) when (flag = !flag) {
    println(“caught second”);
}

Here's the WebAssembly it would compile to with the answer design:

(dispatch_tag $csharp_exn : [csharp_ref] -> [])

(block $outer
    (block $first
        (block $second
            (answer $csharp_exn ;; [csharp_ref] -> []
                (local.set $flag (i32.xor 1 (local.get $flag)))
                (unwind_to_if $first (local.get $flag))
                (local.set $flag (i32.xor 1 (local.get $flag)))
                (unwind_to_if $second (local.get $flag))
                (call_stack_from $outer $csharp_exn)
            within
                …
                (br $outer)
            )
        ) ;; $second : [csharp_ref]
        (call $println “caught second”)
        (br $outer)
    ) ;; $first : [csharp_ref]
    (call $println “caught first”)
    (br $outer)
) ;; $outer : []

A C# throw e; would compile to the instruction call_stack $csharp_exn. It would not compile to a throw $event instruction. The throw $event instruction is for when you do not know where to unwind to and need to determine it dynamically by looking for catch clauses with a matching $event as you unwind the stack. But in this example the stack-inspection code has already determined the destination, so we only need to run the unwind blocks and can skip the catch blocks entirely (and consequently don't need an $event to unwind with).

aheejin commented 4 years ago

How does it know the destination stack even before it runs filters and such? What makes this exception special?

Also, other than you changed throw to call_stack and try-catch to answer-within and swapped the order of try and catch, what is the difference of this from the EH proposal? (The current EH proposal does not have a filter function or block, but let's assume we add one there. I'm talking about the overall structure) Why would we need another, whole new set of instructions for that? Also, I mentioned (in an earlier Zoom meeting) we hope to keep the order of try and catch part as is, in the order of try and then catch, because the original code is structured that way and it is easier for the toolchain. Is there any reason you keep swapping them?

RossTate commented 4 years ago

@aheejin This issue is about stack inspection as a high-level consideration. You are digging into specifics that belong in something more focused on two-phase exception handling. That could be WebAssembly/exception-handling#123, but you asked me not to talk about this there. If you want to dig into these details more, please create a separate issue.

@rossberg Please illustrate how you plan to support these applications. Your main objection seemed to be that you had some prior mechanism. However, the only one that you've presented that I can imagine might be related is continuations, but that is far too heavy handed and inefficient for these applications.

aardappel commented 4 years ago

@RossTate could we maybe have a small but complete example? Let's imagine a simple function somewhere in the middle of the calls stack (has a caller and a function it calls), does some simple computation, and provide answer code that simply calls an imaginary print function on its locals. Then maybe some code that shows how to iterate through some bounded number of these stack frames to print them.

Maybe some indication on how an engine would typically implement this (presumably instr1 are not in-line but in a separate function), what effect does code before answer have, does answer itself emit any code in the main function, etc etc.

RossTate commented 4 years ago

Let's consider single-phase exception handling. Engines conceptually implement single-phase exception handling by first compiling the function with each try instr1* catch instr2* end clause replaced with just instr1*, but with two important caveats:

they track which assembly instructions implement an instruction in instr1*
if any of those instructions are function calls, they require that the stack frame be in a particular state that the catch clause understands, e.g. the value of all local variables are stored in their corresponding position in the stack frame

Then the catch bodies are compiled. But the main function code has no reference to them. Instead, entries are added to a code-range table to indicate "if the return address I give to a callee is in range R, then exceptions for event E are handled by block B". So when some function down the stack throws an exception, the engine walks the stack, cleaning the stack up as it goes, and repeatedly comparing the return addresses on the stack with the code-range table until it finds a return address that is within a range R associated with the event E of the exception that was thrown, in which case it jumps to block B, handing B the payload of the exception.

Nearly everything is the same for answer. The differences are:

you do not unwind the stack while you look for the answer
instead of the table being an association of ranges, events, and blocks, the table is an association of ranges, dispatch tags, and answers
when you find a matching answer, instead of jumping to it, you call it, handing it the appropriate stack-frame pointer as one of its arguments

aardappel commented 4 years ago

Thanks, that's a good comparison, but it still leaves me a lot of questions:

So an answer executes on top of the stack. Does it have its own stack frame (can it have locals that are entirely contained in the answer?) At the very minimum it must have a stack for temporaries.
If it does not have its own locals, does that mean there are locals in containing function that are unused unless the answer is run?
If it receives a "pointer to the stackframe", and this is different from how stackframe access is normally implemented, does this mean that an engine must translate a local.get in a different way when its inside an answer?
If invoking the answer is driven from an engine-provided stack walker, how do you influence it? E.g. terminate the stackwalk?
Are there things you can't do from an answer (invoke more stackwalks, throw exceptions.. call functions? JS?)

RossTate commented 4 years ago

Good questions. I think I can do the first three all together:

A func has stack space for its locals and stack space for its temporaries. An answer also has stack space for its locals and stack space for its temporaries. The difference between the two is that the two stack spaces for a func are generally directly adjacent to each other, whereas the two stack spaces for an answer are separate: the temporaries are at the top/bottom of the stack, whereas the locals are in the stack-frame passed to the answer.

Are there things you can't do from an answer (invoke more stackwalks, throw exceptions.. call functions? JS?)

No restrictions. So essentially, an answer is just a func whose locals are in a preexisting stack frame.

If invoking the answer is driven from an engine-provided stack walker, how do you influence it? E.g. terminate the stackwalk?

This broadens the scope of the discussion to stack walking. More general stack walking is certainly useful (call_stack is a very specific pattern), but I'm worried about forking the discussion too much, so for now I'll refer to WebAssembly/exception-handling#105 for ideas on stack walking more generally. That said, one kind of instruction that's useful to be aware of is stack-walk-guiding instructions like stack.wall instr* end, which prevents instr* from walks further up the stack. These guiding instructions are useful for more security-oriented applications, and any full proposal for stack inspection would need to consider that part of the design space thoroughly.

aheejin commented 4 years ago

@Rosstate

@aheejin This issue is about stack inspection as a high-level consideration. You are digging into specifics that belong in something more focused on two-phase exception handling. That could be WebAssembly/exception-handling#123, but you asked me not to talk about this there. If you want to dig into these details more, please create a separate issue.

I think at least two of my three previous questions are not about EH specifics.

1.

How does it know the destination stack even before it runs filters and such? What makes this exception special?

This was the question to the sentences to your paragraph in https://github.com/WebAssembly/design/issues/1356#issuecomment-662803504:

But in this example the stack-inspection code has already determined the destination, so we only need to run the unwind blocks and can skip the catch blocks entirely (and consequently don't need an $event to unwind with).

2.

... What is the difference of this from the EH proposal? (The current EH proposal does not have a filter function or block, but let's assume we add one there. I'm talking about the overall structure) Why would we need another, whole new set of instructions for that?

This is also not about EH specifics. I simply didn't understand how the functionality of these instructions are different from the existing EH instructions (other than it has filters). I was asking what additional functionality these instructions provide.

aardappel commented 4 years ago

I personally think this would be a very versatile feature that could enable all sorts of languages features.

Though, from my understanding of @RossTate's explanation above, I think it would good to explore what potential costs would be.

A function that has an answer inside of it needs to "inspectable", which means that, if it does register allocation at all, it needs to spill these back to the exact stack slots they came from, as opposed current implementations which may choose to have a local live in a register only (and never even have a stack slot), and use whatever register caller/callee saving method is efficient.

On top of that there's the locals only used in the answer, which just make the stack usage slightly bigger, which may have indirect costs (cache efficiency).

Now such costs are minor in the case of exception handling, since the amount of functions containing a try-catch is a small fraction, but there are use cases for stack inspection where pretty much every function would need an answer, like stack trace printing or linear memory GC.

In addition to the above (minor) costs, there's the issue of code size, if every function has an answer. You'd want to have these answers share an implementation of different locals "signatures" to reduce code bloat, which you could do by having the answer directly call a function for any unique signature. But even calling such a function with all locals will produce a fair bit of code in each function, so I wonder if specifying answer functions completely external to each function could be better. Of course, that is a much more intrusive design change to the existing Wasm design.

RossTate commented 4 years ago

Good considerations. But do take into account that all of those costs would be (substantially) worse without stack inspection. That is, languages need to compile these features to WebAssembly in some way or another, and stack inspection is the most efficient, compact, and composable means to do so. (That said, I also agree that it's worthwhile to investigate if there are ways to support particularly common patterns even more compactly.)

aardappel commented 4 years ago

Absolutely agreed. While I am just guessing (until we have representative benchmarks), my bet would be that "just use a shadow stack instead" entails a huge performance hit compared to this feature. I just want to be clear what the consequences could be.

fgmccabe commented 4 years ago

@aardappel :

A function that has an answer inside of it needs to "inspectable", which means that, if it does register allocation at all, it needs to spill these back to the exact stack slots they came from, as opposed current implementations which may choose to have a local live in a register only (and never even have a stack slot), and use whatever register caller/callee saving method is efficient.

This is not quite accurate. A responder needs to know where the local variables are. E.g., if a compiler decided to put a given local variable in a register instead of a stack slot -- in a region where there is a responder -- then the compilation of that responder would simply rely on the variable location. Of course, that depends on the variable being in the same location thoughout the region (it does not need to be a stack slot, it just needs to be consistent).

aardappel commented 4 years ago

@fgmccabe yes, that makes sense. Though needing to have the same register assignment (and the same way to save them) across every call in a function is still potentially a restriction. But I'd agree that it could generally be very minor.

The ability for the engine to still do register allocation (and make choices local to a function, as opposed to a uniform layout for all) is probably one of the major things that sets this apart from a shadow stack solution.

WebAssembly / design

Stack Inspection #1356