Open conrad-watt opened 1 week ago
I should say explicitly that thinking through the implications of a separate continuation closure type, things seem to get quite messy (e.g. with separate resume variants for each type), which leaves me looking back on our previous "function closure" thoughts with a little discomfort.
I think these are important considerations, and I'd like to know how much supporting cont.bind
costs programs that do not use it.
At the same time, I think there's a reasonable justification for considering using a separate class of type for func.bind
but not for cont.bind
, namely that a user space implementation of func.bind
necessarily needs to use something other than a function type to represent closures, but a user space implementation of cont.bind
could use normal continuation types as the results of partial application. If you naively migrate those user space implementations down into the engine, you end up with a separate closure type for functions but not for continuations.
The idea was always use use the underlying stack resource to store the extra 'stuff'. One consequence of this is that cont.bind would invalidate its continuation argument
Ah interesting, I think this answers my question about how the unboxed representation would work (assuming the stack resource is still a separate allocation). I didn't realise cont.bind
would invalidate its argument.
EDIT: I think the question about the overhead of the conditional check is still relevant, and naively storing the bound "stuff" in the stack resource might also make the check more expensive.
cont.bind must invalidate its argument; otherwise you would effectively get multi-shot continuations.
As for implementation cost of cont.bind; a 'natural' strategy would be to use the underlying stack resource. This is similar to how argument spilling for a normal call would work. I wanted to use right-to-left binding because then it would be nearly exactly how argument spilling would work. But, with left-to-right, is 'similar to' but not 'the same as' argument spilling. Apart from the additional costs of copying registers, this particular part (register allocation) of the WebAssembly engine is probably the most complex single bit of code.
Just checking in to see if I understand other people's ideas correctly.
I'm assuming that when switching stacks, the ideal "calling convention" for passing payloads to the destination continuation/stack is the same as the actual calling convention used for Wasm function calls on the current platform. I'm only talking about the aspect of a calling convention here that dictates what arguments go in what registers, and how stack slots are potentially used for the remaining arguments.
The idea is that when we switch to a fresh stack that is about to start execution a Wasm function, the arguments used for switching can be passed on to that function without needing to reshuffle registers (or stack slots). Thus, by always using the function calling convention for switching stacks, we don't need to dynamically check if the destination continuation is a fresh one (about to execute a function from the start), or one that was previously suspended: The payload passing would always work the same anyway.
Does that align with what other people had in mind? I believe that @fgmccabe's ideas about cont.bind
are built around this idea, correct?
cont.bind
payloads@fgmccabe, it seems like you have a pretty concrete idea of how to lay out memory efficiently for cont.bind
. Here's my understanding of your idea, please let me know if that's indeed what you have in mind.
Let's use
$ct1 ~~ cont [i64 i64 i64 i64 i64 i64 i32 i32] -> []
$ct2 ~~ cont [i64 i64 i64 i64 ] -> []
as examples, assume we are using the x64 System V calling convention and cont.bind
works right-to-left.
When we create a continuation of type $ct1
, we know that the calling convention dictates that the first six integer arguments (i64
here) are passed in registers, and the last two of them (i32
here) are passed on the stack.
Thus, your plan is that when we create a continuation of type $ct1
, we reserve a memory region within the continuation's actual stack space to be used for writing cont.bind
arguments. This region is located so that it overlaps with the memory used for the stack slots for passing the two i32
values (per the calling convention). Thus, cont.bind
would write the i32
arguments exactly where they are needed for passing them as the 7th and 8th argument on the stack.
More concretely, using an example:
(func $test (param $c1 (ref $ct1))
(local $c2 (ref $ct2))
;; Integers 5 6 7 8 are written on c1's stack.
(cont.bind $ct1 $ct2 (i32.const 8) (i32.const 7) (i64.const 6) (i64.const 5) (local.get $c1))
(local.set $c2)
;; We load values 5 and 6 from the payloads buffer/stack in order to pass them in registers.
;; Values 1 to 4 go in registers immediately,
;; This yields the following:
;; rdi: 1, rsi: 2, rdx: 3, rcx: 4, r8: 5, r9: 6
;; Integers 7 and 8 are passed as the 7th and 8th argument on the stack.
;; We do not need to move them, cont.bind has automatically stored them at the right addresses.
(resume $ct2 (i64.const 1) (i64.const 2) (i64.const 3) (i64.const 4) (local.get $c2))
)
I havent completely parsed your register assignments; but, yes, that is what I had in mind. Using the normal function call mechanism for stack switching where at all possible. In fact, we already use basic argument handling for JSPI. Note that the actual stack switch is quite a bit more complex than a function call.
Thanks for clarifying!
Note that the actual stack switch is quite a bit more complex than a function call.
Yes, I should have made it more clear that I was only focusing on the payload passing aspect here and ignoring everything else.
Regarding your idea of letting cont.bind
write directly into the stack slots used for argument passing whenever possible: My main concern is that I believe that when using your idea, a cont.bind
instruction given some arguments cannot necessarily write these arguments to adjacent slots in the payload buffer.
As an example, let's add even more arguments that will be passed on the stack (again assuming a variant of cont.bind
that works right-to-left):
$ct1 ~~ cont [i64 i64 i64 i64 i64 i64 i32 i32 f64 f64 i32 i32] -> []
$ct2 ~~ cont [i64 i64 i64 i64 i64 i64 ] -> []
If you resume
a continuation of type $ct1
, System V would dictate that all the six i64
and two f64
values are passed in registers, while the i32
values must be passed on the stack. In particular, you need 4 consecutive stack slots containing the i32
values.
That means that if you do (cont.bind $ct1 $ct2)
, you would bind 6 arguments, but they must not be written to memory in the order they appear in the type (i.e., i32 i32 f64 f64 i32 i32
) to make your idea work. Instead, we must detect that the four i32
values must appear consecutively, in order to create the stack slots for argument passing without further shuffling.
To me, it looks like cont.bind
would need to decide for each argument which of two buffers to add it to: One containing arguments that will be passed in stack slots, carefully laid out in the way the stack arguments will be passed, and another one for payloads that will be passed in registers.
I will have to think about whether all of this logic could be encoded statically into the translation of a particular cont.bind
instruction, or if this juggling of two separate buffers would require any runtime information.
In any case, it seems to me that making this work is quite intricate, but only applies if you use cont.bind
on a function that has so many arguments that some of the are passed on the stack.
I'd like to know how much supporting
cont.bind
costs programs that do not use it.
AFAICT a continuation that has no bound arguments can be invoked without any register shuffling and the (I assume uncommon case) of argument shuffling can be supported by the stack resource storing an extra code pointer. In addition to the buffer management considerations above, the fastest way to go from the resume
site to the first continuation instruction is to go through a stub customized to the number of bound arguments. So a cont.bind
switches the code pointer (i.e. resume point) in the stack from one state to the next, with the initial state being the actual IP (or a small stub for an inter-instance continuation).
So if the top-level consideration is whether we should remove cont.bind
purely because of the overhead it might impose on non-bound continuations, I think the answer "it saves one code pointer in the stack resource". I don't think it inherently costs any machine instructions on the invocation path.
When we previously discussed
func.bind
, my memory is that we ended up with a soft idea that this could be accomplished with a future separate "closure" type that has space in its representation for bound function arguments. Some of the talking points from this previous discussion also seem to also apply tocont.bind
:cont.bind
resume
requires an additional conditional check to work out whether any bound arguments exist, penalisingresume
on "unbound" continuationsHave these points been discussed previously, and has a separate (or subtyped) "continuation closure" type been considered analogous to the "closure" type we considered for
func.bind
?