WebAssembly / stack-switching

A repository for the stack switching proposal.
Other
134 stars 13 forks source link

Implementation characteristics of cont.bind (cf. func.bind) #87

Open conrad-watt opened 1 week ago

conrad-watt commented 1 week ago

When we previously discussed func.bind, my memory is that we ended up with a soft idea that this could be accomplished with a future separate "closure" type that has space in its representation for bound function arguments. Some of the talking points from this previous discussion also seem to also apply to cont.bind:

Have these points been discussed previously, and has a separate (or subtyped) "continuation closure" type been considered analogous to the "closure" type we considered for func.bind?

conrad-watt commented 1 week ago

I should say explicitly that thinking through the implications of a separate continuation closure type, things seem to get quite messy (e.g. with separate resume variants for each type), which leaves me looking back on our previous "function closure" thoughts with a little discomfort.

tlively commented 1 week ago

I think these are important considerations, and I'd like to know how much supporting cont.bind costs programs that do not use it.

At the same time, I think there's a reasonable justification for considering using a separate class of type for func.bind but not for cont.bind, namely that a user space implementation of func.bind necessarily needs to use something other than a function type to represent closures, but a user space implementation of cont.bind could use normal continuation types as the results of partial application. If you naively migrate those user space implementations down into the engine, you end up with a separate closure type for functions but not for continuations.

fgmccabe commented 1 week ago

The idea was always use use the underlying stack resource to store the extra 'stuff'. One consequence of this is that cont.bind would invalidate its continuation argument

conrad-watt commented 1 week ago

Ah interesting, I think this answers my question about how the unboxed representation would work (assuming the stack resource is still a separate allocation). I didn't realise cont.bind would invalidate its argument.

EDIT: I think the question about the overhead of the conditional check is still relevant, and naively storing the bound "stuff" in the stack resource might also make the check more expensive.

fgmccabe commented 1 week ago

cont.bind must invalidate its argument; otherwise you would effectively get multi-shot continuations.

fgmccabe commented 1 week ago

As for implementation cost of cont.bind; a 'natural' strategy would be to use the underlying stack resource. This is similar to how argument spilling for a normal call would work. I wanted to use right-to-left binding because then it would be nearly exactly how argument spilling would work. But, with left-to-right, is 'similar to' but not 'the same as' argument spilling. Apart from the additional costs of copying registers, this particular part (register allocation) of the WebAssembly engine is probably the most complex single bit of code.

frank-emrich commented 1 week ago

Just checking in to see if I understand other people's ideas correctly.

"Calling convention" for stack switching

I'm assuming that when switching stacks, the ideal "calling convention" for passing payloads to the destination continuation/stack is the same as the actual calling convention used for Wasm function calls on the current platform. I'm only talking about the aspect of a calling convention here that dictates what arguments go in what registers, and how stack slots are potentially used for the remaining arguments.

The idea is that when we switch to a fresh stack that is about to start execution a Wasm function, the arguments used for switching can be passed on to that function without needing to reshuffle registers (or stack slots). Thus, by always using the function calling convention for switching stacks, we don't need to dynamically check if the destination continuation is a fresh one (about to execute a function from the start), or one that was previously suspended: The payload passing would always work the same anyway.

Does that align with what other people had in mind? I believe that @fgmccabe's ideas about cont.bind are built around this idea, correct?

Where to collect cont.bind payloads

@fgmccabe, it seems like you have a pretty concrete idea of how to lay out memory efficiently for cont.bind. Here's my understanding of your idea, please let me know if that's indeed what you have in mind.

Let's use

$ct1 ~~ cont [i64 i64 i64 i64 i64 i64 i32 i32] -> []
$ct2 ~~ cont [i64 i64 i64 i64                ] -> []

as examples, assume we are using the x64 System V calling convention and cont.bind works right-to-left.

When we create a continuation of type $ct1, we know that the calling convention dictates that the first six integer arguments (i64 here) are passed in registers, and the last two of them (i32 here) are passed on the stack.

Thus, your plan is that when we create a continuation of type $ct1, we reserve a memory region within the continuation's actual stack space to be used for writing cont.bind arguments. This region is located so that it overlaps with the memory used for the stack slots for passing the two i32 values (per the calling convention). Thus, cont.bind would write the i32 arguments exactly where they are needed for passing them as the 7th and 8th argument on the stack.

More concretely, using an example:

(func $test (param $c1 (ref $ct1))
  (local $c2 (ref $ct2))

  ;; Integers 5 6 7 8 are written on c1's stack.
  (cont.bind $ct1 $ct2 (i32.const 8) (i32.const 7) (i64.const 6) (i64.const 5) (local.get $c1))
  (local.set $c2)

  ;; We load values 5 and 6 from the payloads buffer/stack in order to pass them in registers. 
  ;; Values 1 to 4 go in registers immediately, 
  ;; This yields the following:
  ;; rdi: 1, rsi: 2, rdx: 3, rcx: 4, r8: 5, r9: 6
  ;; Integers 7 and 8 are passed as the 7th and 8th argument on the stack.
  ;; We do not need to move them, cont.bind has automatically stored them at the right addresses.
  (resume $ct2 (i64.const 1) (i64.const 2) (i64.const 3) (i64.const 4) (local.get $c2))
)
fgmccabe commented 1 week ago

I havent completely parsed your register assignments; but, yes, that is what I had in mind. Using the normal function call mechanism for stack switching where at all possible. In fact, we already use basic argument handling for JSPI. Note that the actual stack switch is quite a bit more complex than a function call.

frank-emrich commented 1 week ago

Thanks for clarifying!

Note that the actual stack switch is quite a bit more complex than a function call.

Yes, I should have made it more clear that I was only focusing on the payload passing aspect here and ignoring everything else.

Regarding your idea of letting cont.bind write directly into the stack slots used for argument passing whenever possible: My main concern is that I believe that when using your idea, a cont.bind instruction given some arguments cannot necessarily write these arguments to adjacent slots in the payload buffer.

As an example, let's add even more arguments that will be passed on the stack (again assuming a variant of cont.bind that works right-to-left):

$ct1 ~~ cont [i64 i64 i64 i64 i64 i64 i32 i32 f64 f64 i32 i32] -> []
$ct2 ~~ cont [i64 i64 i64 i64 i64 i64                        ] -> []

If you resume a continuation of type $ct1, System V would dictate that all the six i64 and two f64 values are passed in registers, while the i32 values must be passed on the stack. In particular, you need 4 consecutive stack slots containing the i32 values.

That means that if you do (cont.bind $ct1 $ct2), you would bind 6 arguments, but they must not be written to memory in the order they appear in the type (i.e., i32 i32 f64 f64 i32 i32) to make your idea work. Instead, we must detect that the four i32 values must appear consecutively, in order to create the stack slots for argument passing without further shuffling.

To me, it looks like cont.bind would need to decide for each argument which of two buffers to add it to: One containing arguments that will be passed in stack slots, carefully laid out in the way the stack arguments will be passed, and another one for payloads that will be passed in registers.

I will have to think about whether all of this logic could be encoded statically into the translation of a particular cont.bind instruction, or if this juggling of two separate buffers would require any runtime information.

In any case, it seems to me that making this work is quite intricate, but only applies if you use cont.bind on a function that has so many arguments that some of the are passed on the stack.

titzer commented 1 week ago

I'd like to know how much supporting cont.bind costs programs that do not use it.

AFAICT a continuation that has no bound arguments can be invoked without any register shuffling and the (I assume uncommon case) of argument shuffling can be supported by the stack resource storing an extra code pointer. In addition to the buffer management considerations above, the fastest way to go from the resume site to the first continuation instruction is to go through a stub customized to the number of bound arguments. So a cont.bind switches the code pointer (i.e. resume point) in the stack from one state to the next, with the initial state being the actual IP (or a small stub for an inter-instance continuation).

So if the top-level consideration is whether we should remove cont.bind purely because of the overhead it might impose on non-bound continuations, I think the answer "it saves one code pointer in the stack resource". I don't think it inherently costs any machine instructions on the invocation path.