Closed npmccallum closed 5 years ago
@frozencemetery @steveeJ @puiterwijk Want to review? Want to invite others?
@vathpela ?
I explained off-list that these coroutines have to be !Send. On GNU/Linux, it is only possible to resume a stackful coroutine on the thread on which it was resumed.
LLVM will need to be told that the setjmp
equivalent can return twice or more, for correctness, otherwise the LLVM optimizers will introduce bugs. There could be problems with Rust's Drop
implementation as well.
Actual setjmp
is quite slow, but is not necessary if the compiler is told about the stack switch.
I explained off-list that these coroutines have to be !Send. On GNU/Linux, it is only possible to resume a stackful coroutine on the thread on which it was resumed.
Agreed. See #12.
LLVM will need to be told that the
setjmp
equivalent can return twice or more, for correctness, otherwise the LLVM optimizers will introduce bugs.
I'm not sure what you're referring to here. In this case, setjmp()
is an LLVM intrinsic. So LLVM knows how to handle it returning twice. See: https://github.com/enarx/frenetic/blob/master/src/jump.ll#L23
There could be problems with Rust's
Drop
implementation as well.
I'm not sure what problem with Drop
you're referring to. In this implementation, we resume the child on coroutine cancellation. This resumption returns Err(Cancelled)
from Control::pause()
and the Control
instance is consumed. This means that no further yields can happen. The function needs to return normally. This causes the stack to unwind exactly as normal code does. I don't see any problem with Drop
. Am I missing something?
Actual
setjmp
is quite slow, but is not necessary if the compiler is told about the stack switch.
I'm not sure what you mean here. Suspending and resuming execution needs to save the relevant register state. This is precisely what setjmp()
does. I'm not sure how this isn't needed.
One very strange bug I hit is #8. If the coroutine stack is allocate on the parent stack, everything blows up. Anyone have any suggestions?
llvm.eh.sjlj.setjmp
is not setjmp
, just like __builtin_setjmp
is not setjmp
. LLVM likely optimizes it with the knowledge that it is only used as part of SJLJ exception handling. I doubt you can use either setjmp
from an alwaysinline
function because a non-local return to a function frame that does not exist is undefined. (alwaysinline
functions still have frames.)
Declaring jump_into
nounwind
is likely wrong because all the function does is unwinding.
llvm.eh.sjlj.setjmp
is notsetjmp
, just like__builtin_setjmp
is notsetjmp
. LLVM likely optimizes it with the knowledge that it is only used as part of SJLJ exception handling. I doubt you can use eithersetjmp
from analwaysinline
function because a non-local return to a function frame that does not exist is undefined. (alwaysinline
functions still have frames.)
I was using setjmp
as shorthand. I always intend llvm.eh.sjlj.setjmp
.
While this objection is certainly possible, the current assembly generated on x86 and x86_64 appears correct to me. If we ever run into a problem, we will just drop alwaysinline
and manually copy the code into the respective functions. In this case, jump_save
exists only for code reuse. I'm happy to sacrifice that for correctness if necessary.
However, I don't think LLVM optimizes this just for SJLJ. llvm.eh.sjlj.setjmp
is how clang implements __builtin_setjmp
, which is general purpose. So I don't think there is a lot of wiggle room here. And I've validated the assembly by hand (which only means that I think it is correct, not that it actually is).
Declaring
jump_into
nounwind
is likely wrong because all the function does is unwinding.
I'm not convinced, but I'm listening... With more details, I'm happy to remove it.
You seem to assume that __builtin_setjmp
and setjmp
are substantially the same. They just happen to have similar names, but they handled differently by the compiler, beyond the differences in jump buffer size.
With an optimizing compiler, you cannot look at assembler output for a few cases and conclude that the general case is fine.
@steveeJ Want to review? Want to invite others?
My review wouldn't add anything substantial here. @sunfishcode maybe?
I feel like I don't understand the problem that's you're attempting to solve to provide a good review. In particular, I'm unclear about "Doesn't need assembly" - at some level everything needs assembly, so at what level do you intend not to need assembly? Similarly, I don't know the rationale for needing this to be in rust - without that, it seems like you could use some of the existing solutions.
@frozencemetery See below.
I feel like I don't understand the problem that's you're attempting to solve to provide a good review.
We have to convert from callbacks to a state machine on top of repos we don't control. So we need to suspend execution inside the callback and change state.
In particular, I'm unclear about "Doesn't need assembly" - at some level everything needs assembly, so at what level do you intend not to need assembly?
This is shorthand for "we don't want to have to write separate code for every platform."
Similarly, I don't know the rationale for needing this to be in rust - without that, it seems like you could use some of the existing solutions.
We are shooting for a pure Rust microkernel. This improves auditability substantially.
You seem to assume that
__builtin_setjmp
andsetjmp
are substantially the same. They just happen to have similar names, but they handled differently by the compiler, beyond the differences in jump buffer size.
They both do roughly the same thing. The main differences are the invariants:
__builtin_setjmp
modifies the stack__builtin_setjmp
generates assembly while setjmp
links against libcThese differences are well defined. So I'm not sure what your point is here.
With an optimizing compiler, you cannot look at assembler output for a few cases and conclude that the general case is fine.
We agree completely. So I'm not sure what we disagree about. __builtin_setjmp
are very similar and their behaviors are well defined. If LLVM optimization breaks is own intrinsics, that's an LLVM bug.
We really need some reviews of our approach. If you want to review the approach, please glance at our open issues first so you can see what we are aware of.
Our goal is to create a stackful Coroutine library for Rust that:
Generator
traits for native integration.std
.The reasoning for this is that we need to use it in our μkernel.
Our basic approach is to use LLVM IR to generate stack switching and
setjmp()
/longjmp()
as raw CPU instructions. This offloads architecture-specific code to LLVM. The remaining code is all Rust.We create a coroutine from a stack (passed as a slice) and a closure. We use
jump_init()
to set the new stack andsetjmp()
in the parent. Then we create the rest of our structures on the stack in the child before jumping back to the parent. Now, the coroutine is setup.If the user of the API doesn't call
Generator::resume()
until completion, when the coroutine is dropped we resume the coroutine with theCancelled
error. This means that the coroutine must clean up immediately. We do this to avoid stack unwinding (so childDrop
is handled correctly).