Async usercall interface for SGX enclaves

vn971 commented 1 year ago

Entering and exiting an SGX enclave is performance costly. It's much more efficient to continue executing within the enclave and communicate with the enclave-runner by passing messages. The tokio runtime can be used for such asynchronous communication. This PR provides very basic support for this in EDP, but changes to mio and tokio still need to be upstreamed. These changes are fully backwards compatible; your existing enclaves will continue to run as expected.

Credits for this PR go to: Mohsen: https://github.com/fortanix/rust-sgx/pull/404 YxC: https://github.com/fortanix/rust-sgx/pull/441

This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).

raoulstrackx commented 11 months ago

Added one last comment related to MakeSend and ticket #530, but will approve merge once this passes further testing.

DragonDev1906 commented 11 months ago

Short Questions (I have not read all the changes):

What exactly is meant with async usercall interface? As far as I can tell the enclave_runner side already uses futures and thus allows making UsercallExtensions with async functions. Does this change only affect the internals of rust-sgx (i.e. not visible from the enclave or runner), does it change the runner interface or does it change how the enclave code can interact with the outside?
Are there examples on how the async usercall interface is used (from the application developer side, if it doesn't just affect internals)?

At the moment I'm not sure what exactly is meant with async usercall interface.

raoulstrackx commented 11 months ago

Good question @DragonDev1906, I've updated the description of this PR to make things more clear. Let me know if you still have questions. This PR doesn't have examples, but we'll add some once the changes to mio and tokio have been upstreamed and things are easier to be used.

DragonDev1906 commented 11 months ago

Nice, I've had a few issues with dependencies that rely on tokio with the net feature, which made it impossible to use them. Thank you for the clarification.

I do have two more questions (though I'm not sure if this is the right place to ask them): At the moment I only have sync code in the enclave, with a custom runner (using tokio and handling tls termination where I don't need it in the enclave) responsible for pushing data received from other systems to the enclave. Basically I'm just sending a continuous list of commands with data and process any results returned from the enclave.

Will enclaves without async code, where the runner doesn't have to wait for the enclave to finish before sending the next command benefit from this change? (I think it might be a good idea to not use async code in the enclave (application code) to lower the complexity, at least if the problem can be converted to a list of commands that should be executed), though please correct me if I'm wrong on that part.
The second one is a bit less related to async usercalls, but since you've asked if I had more questions ... I'm currently contemplating which implementation is best for the above stated situation (goal: High throughput, I'm getting a medium amount of data but also need some computation on it (mainly hashing and signature verification), so I might even end up compute-bound): 1) Communicate via TCP (no custom runner needed), likely slow because it needs to go into kernel space 2) Communicate via the existing Usercall Extensions (hence the question of whether there will be performance benefits in this Situation) 3) Communicate via the async usercall interface (unless that only makes sense when the enclave itself runs async code). 4) Use the enclave in library mode. At the moment I have no idea how to estimate the performance of this approach, as it basically means no async at all (as far as I can tell) and having to wait for the previous call to finish before continuing. It may save on serialization and deserialization (unless that's just done automatically), but I think it gives less flexibility than a (buffered) TCP or (async) usercall extension.

I plan to test the throughput of those options, but perhaps you already have some experience or suggestions which option may be the slowest/most inefficient. Especially the library mode and if such a system would even benefit of the async usercall interface changes. (It could also be useful to have such a comparison of communication options somewhere in the docs).

(so many questions, sorry)

raoulstrackx commented 11 months ago

No worries @DragonDev1906

Will enclaves without async code, where the runner doesn't have to wait for the enclave to finish before sending the next command benefit from this change?

No, without changes to your code, this PR doesn't have any impact for you.

... which implementation is best for the above stated situation... i. Communicate via TCP (no custom runner needed), likely slow because it needs to go into kernel space

If you use the changes in this PR to build an async enclave, your code will be a bit more readable. Biggest change would be that you don't need to enter/exit the enclave to request new commands/return responses. If the enclave is compute expensive, the performance benefit of that may be minimal. Async code works best when it no longer blocks on I/O, but can do something useful while it waits for some event. Based on your description, you may already be doing that with a custom runner.

ii. Communicate via the existing Usercall Extensions

See previous answer

iii. Communicate via the async usercall interface (unless that only makes sense when the enclave itself runs async code).

Yes that only makes sense if the enclave runs async code

iv. Use the enclave in library mode.

That seems unrelated to whether you right sync or async code.

DragonDev1906 commented 11 months ago

Biggest change would be that you don't need to enter/exit the enclave to request new commands/return responses.

Just to see if I understood that correctly: The changes in this PR (when using the new async interface) are going to mean that multiple usercalls can/will be batched into a single ECALL (enter/exit), with the ability to use async code to send multiple usercalls without waiting for the response. But there still needs to be at lest one ECALL (for the entire batch) before the runner can process the usercall and the same for the way back, correct?

Just some info if you're interested, @raoulstrackx:

Based on your description, you may already be doing that with a custom runner.

Yeah, my enclave is not waiting for any responses for requests sent out (that's handled outside the enclave) and only blocks while trying to read new commands (currently via TCP) or writing results (also via TCP), but new commands don't depend on previous results unless something goes wrong.

If you use the changes in this PR to build an async enclave, your code will be a bit more readable. [...] Based on your description, you may already be doing that with a custom runner.

I've thought about implementing in a "enclave requests the data and waits for the response" way, where async usercalls would likely be a big performance benefit and/or be a lot more readable. My conclusion to that was that there is a rather big trade-of:

If the enclave does the requests directly, without an intermediary, it needs to terminate TLS, which is necessary for most use cases but for me the data integrity is provided in the data itself, using hashes, merkle trees and signatures, so TLS termination in the enclave only added complexity.
If I have a simple intermediary that just strips TLS and the enclave sends requests to get some data out (typical async model) the requesting logic would be simpler, as the runner wouldn't have to know what data is needed next, but it hides things a malicious runner could do. Additionally, if there is a bug in the Enclave code (e.g. requesting the wrong data), the enclave code would need to be updated, not just the runner code (in our case updating the runner code is a lot easier).
With the approach I've now gone (which I hopefully won't regret choosing): The runner providing the data and the Enclave just checking validity of it (and if anything is missing) the runner clearly has the ability to decide the order of the data/commands (which he kind of could do anyways, but not as easily) and a change to the order only requires updating the runner code, the enclave code is simpler. It doesn't need much code for network communication, it doesn't need async code, its execution can be deterministic given the input (hard to do with async) and thus makes auditing the enclave code easier, but that comes at additional complexity in the code generating the commands to run and thus a bigger chance for the entire system to stop working until the runner is updated. The main disadvantage is having to split the processing logic from the data fetching.

I'm not yet sure if this architecture is going to bite me at some point. It's good to know that there will be an efficient way to implement it in a "enclave asks for data" way should the need arise to do that because a complete separation of fetching and logic gets too difficult.

raoulstrackx commented 11 months ago

@DragonDev1906 sorry I forgot to reply to your comment.

The changes in this PR (when using the new async interface) are going to mean that multiple usercalls can/will be batched into a single ECALL (enter/exit),

Strictly speaking: yes, but I think you misunderstood a bit how EDP is expecting to be used. The idea is to run an entire application in the enclave. So the single ecall you refer to, is coming from the enclave-runner that calls the enclave for the very first time. This eventually leads to the enclave calling your main function within its boundaries. Then all usercalls can be done asynchronously from within the enclave. See also the enclave execution lifecycle

For questions/comments not specifically related to this PR. Let's switch to the #rust-sgx channel in the Runtime-Encryption Slack workspace

fortanix / rust-sgx

Async usercall interface for SGX enclaves #515