delta-incubator / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
110 stars 29 forks source link

FFI is not prod-safe until `c_unwind` RFC stabilizes and we can use it #113

Open ryan-johnson-databricks opened 7 months ago

ryan-johnson-databricks commented 7 months ago

For FFI to really be prod-ready, we need the rust c_unwind RFC to stabilize (see https://github.com/rust-lang/rust/issues/115285 and https://github.com/rust-lang/rust/pull/116088), and then we must update FFI code to take advantage of it. Otherwise, exceptions thrown from C++ code, or panic! originating in rust code, have undefined behavior when unwinding across an FFI boundary.

Context

C++ and Rust take completely different approaches to exception handling.

Rust generally prefers methods to return an error Result when something expected and recoverable goes wrong, and to panic! if the problem is unexpected or unrecoverable. In turn, panic! may be implemented by stack unwinding, if the runtime is configured with panic=unwind.

Meanwhile, C++ relies heavily on exceptions, and on stack unwinding to run destructors as uncaught exceptions propagate. Exceptions can be thrown from just about anywhere (including the standard library) unless extreme care is taken to avoid them.

Because kernel-rs relies heavily on callbacks, and often on double-trampoline callbacks, we must expect cases where an exception thrown from C++ might propagate uncaught through rust code and back into C++ where it is caught and handled. Or for a rust panic to propagate uncaught through C++ code and back to rust code protected by a catch_unwind block.

Unfortunately, any kind of stack unwinding across an FFI boundary (whether C++ -> Rust or Rust -> C++) is undefined behavior today in Rust. The c_unwind RFC proposes to address this by allowing exceptions thrown by extern "c-unwind" fn to safely propagate -- uncaught -- through rust code and back into C++, and for panics in extern "rust-unwind" fn to safely propagate -- also uncaught -- through C++ code and back into rust. Attempting to catch a foreign exception would still produce undefined behavior, but we don't need that capability.

scovich commented 2 months ago

The stabilization PR https://github.com/rust-lang/rust/pull/116088 finally merged and https://github.com/rust-lang/rust/issues/74990 has closed. Now we just need to wait for the next rust release to ship. Among other things, this means duckdb-delta can just throw an exception directly when kernel asks it to allocate an error, which would simplify a lot of FFI code.

nicklan commented 1 month ago

As a subtask we sould protect all engine entry points with catch_unwind calls, so panics don't escape into engine. We need the c-unwind protection because otherwise a panic that propagates from kernel to engine and back would be UB. (from here)