enarx-archive / frenetic

A library for stackful coroutines with no OS dependencies.
Apache License 2.0
9 stars 3 forks source link

Gather architectural reviews #5

Closed npmccallum closed 5 years ago

npmccallum commented 5 years ago

We really need some reviews of our approach. If you want to review the approach, please glance at our open issues first so you can see what we are aware of.

Our goal is to create a stackful Coroutine library for Rust that:

  1. Uses the Rust Generator traits for native integration.
  2. Uses no system calls.
  3. Can be built without std.
  4. Doesn't need assembly.

The reasoning for this is that we need to use it in our μkernel.

Our basic approach is to use LLVM IR to generate stack switching and setjmp()/longjmp() as raw CPU instructions. This offloads architecture-specific code to LLVM. The remaining code is all Rust.

We create a coroutine from a stack (passed as a slice) and a closure. We use jump_init() to set the new stack and setjmp() in the parent. Then we create the rest of our structures on the stack in the child before jumping back to the parent. Now, the coroutine is setup.

If the user of the API doesn't call Generator::resume() until completion, when the coroutine is dropped we resume the coroutine with the Cancelled error. This means that the coroutine must clean up immediately. We do this to avoid stack unwinding (so child Drop is handled correctly).

npmccallum commented 5 years ago

@frozencemetery @steveeJ @puiterwijk Want to review? Want to invite others?

npmccallum commented 5 years ago

@vathpela ?

fweimer commented 5 years ago

I explained off-list that these coroutines have to be !Send. On GNU/Linux, it is only possible to resume a stackful coroutine on the thread on which it was resumed.

LLVM will need to be told that the setjmp equivalent can return twice or more, for correctness, otherwise the LLVM optimizers will introduce bugs. There could be problems with Rust's Drop implementation as well.

Actual setjmp is quite slow, but is not necessary if the compiler is told about the stack switch.

npmccallum commented 5 years ago

I explained off-list that these coroutines have to be !Send. On GNU/Linux, it is only possible to resume a stackful coroutine on the thread on which it was resumed.

Agreed. See #12.

npmccallum commented 5 years ago

LLVM will need to be told that the setjmp equivalent can return twice or more, for correctness, otherwise the LLVM optimizers will introduce bugs.

I'm not sure what you're referring to here. In this case, setjmp() is an LLVM intrinsic. So LLVM knows how to handle it returning twice. See: https://github.com/enarx/frenetic/blob/master/src/jump.ll#L23

There could be problems with Rust's Drop implementation as well.

I'm not sure what problem with Drop you're referring to. In this implementation, we resume the child on coroutine cancellation. This resumption returns Err(Cancelled) from Control::pause() and the Control instance is consumed. This means that no further yields can happen. The function needs to return normally. This causes the stack to unwind exactly as normal code does. I don't see any problem with Drop. Am I missing something?

Actual setjmp is quite slow, but is not necessary if the compiler is told about the stack switch.

I'm not sure what you mean here. Suspending and resuming execution needs to save the relevant register state. This is precisely what setjmp() does. I'm not sure how this isn't needed.

npmccallum commented 5 years ago

One very strange bug I hit is #8. If the coroutine stack is allocate on the parent stack, everything blows up. Anyone have any suggestions?

fweimer commented 5 years ago

llvm.eh.sjlj.setjmp is not setjmp, just like __builtin_setjmp is not setjmp. LLVM likely optimizes it with the knowledge that it is only used as part of SJLJ exception handling. I doubt you can use either setjmp from an alwaysinline function because a non-local return to a function frame that does not exist is undefined. (alwaysinline functions still have frames.)

Declaring jump_into nounwind is likely wrong because all the function does is unwinding.

npmccallum commented 5 years ago

llvm.eh.sjlj.setjmp is not setjmp, just like __builtin_setjmp is not setjmp. LLVM likely optimizes it with the knowledge that it is only used as part of SJLJ exception handling. I doubt you can use either setjmp from an alwaysinline function because a non-local return to a function frame that does not exist is undefined. (alwaysinline functions still have frames.)

I was using setjmp as shorthand. I always intend llvm.eh.sjlj.setjmp.

While this objection is certainly possible, the current assembly generated on x86 and x86_64 appears correct to me. If we ever run into a problem, we will just drop alwaysinline and manually copy the code into the respective functions. In this case, jump_save exists only for code reuse. I'm happy to sacrifice that for correctness if necessary.

However, I don't think LLVM optimizes this just for SJLJ. llvm.eh.sjlj.setjmp is how clang implements __builtin_setjmp, which is general purpose. So I don't think there is a lot of wiggle room here. And I've validated the assembly by hand (which only means that I think it is correct, not that it actually is).

Declaring jump_into nounwind is likely wrong because all the function does is unwinding.

I'm not convinced, but I'm listening... With more details, I'm happy to remove it.

fweimer commented 5 years ago

You seem to assume that __builtin_setjmp and setjmp are substantially the same. They just happen to have similar names, but they handled differently by the compiler, beyond the differences in jump buffer size.

With an optimizing compiler, you cannot look at assembler output for a few cases and conclude that the general case is fine.

steveej commented 5 years ago

@steveeJ Want to review? Want to invite others?

My review wouldn't add anything substantial here. @sunfishcode maybe?

frozencemetery commented 5 years ago

I feel like I don't understand the problem that's you're attempting to solve to provide a good review. In particular, I'm unclear about "Doesn't need assembly" - at some level everything needs assembly, so at what level do you intend not to need assembly? Similarly, I don't know the rationale for needing this to be in rust - without that, it seems like you could use some of the existing solutions.

npmccallum commented 5 years ago

@frozencemetery See below.

I feel like I don't understand the problem that's you're attempting to solve to provide a good review.

We have to convert from callbacks to a state machine on top of repos we don't control. So we need to suspend execution inside the callback and change state.

In particular, I'm unclear about "Doesn't need assembly" - at some level everything needs assembly, so at what level do you intend not to need assembly?

This is shorthand for "we don't want to have to write separate code for every platform."

Similarly, I don't know the rationale for needing this to be in rust - without that, it seems like you could use some of the existing solutions.

We are shooting for a pure Rust microkernel. This improves auditability substantially.

npmccallum commented 5 years ago

You seem to assume that __builtin_setjmp and setjmp are substantially the same. They just happen to have similar names, but they handled differently by the compiler, beyond the differences in jump buffer size.

They both do roughly the same thing. The main differences are the invariants:

  1. __builtin_setjmp modifies the stack
  2. __builtin_setjmp generates assembly while setjmp links against libc

These differences are well defined. So I'm not sure what your point is here.

With an optimizing compiler, you cannot look at assembler output for a few cases and conclude that the general case is fine.

We agree completely. So I'm not sure what we disagree about. __builtin_setjmp are very similar and their behaviors are well defined. If LLVM optimization breaks is own intrinsics, that's an LLVM bug.