dtolnay / cxx

Safe interop between Rust and C++
https://cxx.rs
Apache License 2.0
5.87k stars 332 forks source link

Does the core safety claim need rephrasing? #1364

Open dabrahams opened 3 months ago

dabrahams commented 3 months ago

From the README:

The core safety claim under this new model is that auditing just the C++ side would be sufficient to catch all problems, i.e. the Rust side can be 100% safe.

I really haven't looked at this in depth so I might be missing something, but it seems to me that given a binding system that exposes unsafe functions to Rust code, the code that uses these unsafe functions has to be audited for correctness.

Some ground assumptions I'm making:

  1. By “problems” you mean “potential sources of undefined behavior.”
  2. “The Rust side can be 100% safe” means that the Rust code can be said to have no undefined behavior by construction, i.e. without any auditing.
dtolnay commented 3 months ago

Arbitrary unsafe functions are not exposed to Rust. The FFI has a list of the C++ functions that need to be reviewed, like this:

https://github.com/dtolnay/cxx/blob/1822e22523fd02ee2655ad686241385fef56025e/demo/src/main.rs#L18-L25

and I can examine just those 4 C++ functions to confirm that they are indeed safe to call from Rust (i.e. a Rust function with the same signature and behavior would be a safe function, not an unsafe one). After that, it's guaranteed that if any source of undefined behavior appears in the program, it's either from (1) a Rust unsafe block unrelated to the FFI (if any), or (2) C++ code unrelated to the FFI. I can audit the Rust unsafe blocks in my program as if C++ wasn't involved, and I can audit all my C++ code the same as if Rust wasn't involved, and convince myself that the program is free of UB.

This is in contrast with traditional C–Rust FFI using bindgen, which always involves unsafe code on the Rust side of the FFI, like this: https://github.com/rust-lang/git2-rs/blob/git2-0.19.0/src/repo.rs#L395-L401.

dabrahams commented 3 months ago

OK, sure, but only a subset of C++ functions can be exposed to Rust as safe. Are you saying CXX doesn't allow other C++ functions (e.g. vector<int>::pop_back()) to be exposed to Rust? How could it possibly prevent that?

(I don't know Rust very well; I assumed unsafe extern "C++ {...}" meant that the Rust declarations within {...} were treated as unsafe by Rust. Was that wrong?)

dtolnay commented 3 months ago

Whether a function is safe to call from Rust is determined by whether the fn is unsafe, not the surrounding block. This function would be unsafe to call from Rust:

extern "C++" {
    unsafe fn pop_back(self: Pin<&mut CxxVector<c_int>>);
}

See https://cxx.rs/extern-c++.html#functions-and-member-functions.

dabrahams commented 3 months ago

Ok thanks. But what about the other questions?

dtolnay commented 3 months ago

Are you saying CXX doesn't allow other C++ functions (e.g. vector<int>::pop_back()) to be exposed to Rust? How could it possibly prevent that?

CXX does allow other C++ functions to be exposed to Rust, by writing unsafe fn in the list of functions in the FFI module as in https://github.com/dtolnay/cxx/issues/1364#issuecomment-2243755729. unsafe fn C++ functions can only be called from unsafe Rust code. Safe C++ functions can be called from safe Rust code. CXX does not prevent FFIs from being expressed in terms of unsafe functions. It does prevent the need to do so, by making it possible for real-world elaborate FFIs to be expressed using safe functions.

In general, if you have some arbitrary C++ function to call whose behavior is not safe, you're not going to call it without writing either unsafe Rust code or (unsafe) C++ code. In your example of pop_front the unsafety can go either on the C++ side:

// C++
template <typename T>
bool checked_pop_back(std::vector<T>& vec) {
  if (vec.empty()) {
    return false;
  } else {
    vec.pop_back();
    return true;
  }
}

// Rust
unsafe extern "C++" {
    #[cxx_name = "checked_pop_back"]
    fn checked_pop_back_int(vec: Pin<&mut CxxVector<c_int>>) -> bool;
}

assert!(ffi::checked_pop_back_int(vec));

or the Rust side:

// Rust
extern "C++" {
    unsafe fn pop_back(vec: Pin<&mut CxxVector<c_int>>);
}

fn checked_pop_back(vec: Pin<&mut CxxVector<c_int>>) {
    assert(!vec.is_empty());
    unsafe { ffi::pop_back(vec) }
}

(From experience with >300 CXX-based libraries at Meta, the bindings tend to consist overwhelmingly of safe functions, like this. This makes them accessible to engineers with limited Rust experience. We've had engineers in their first and second week with Rust successfully write bindings that would have been challenging for me to produce without CXX as a Rust expert and C++ "knowledgeable".)

I assumed unsafe extern "C++" {...} meant that the Rust declarations within {...} were treated as unsafe by Rust. Was that wrong?

The assumption is wrong -- unsafe extern "C++" { fn ... } is a function that is safe to call. This is an unsafe thing to claim about a function, hence the unsafe keyword. Inversely extern "C++" { unsafe fn ... } is a function that is unsafe to call, and isn't unsafe to claim that.

dabrahams commented 3 months ago

It does prevent the need to do so, by making it possible for real-world elaborate FFIs to be expressed using safe functions.

s/real-world/some real-world/ or even s/real-world/many real-world/ if you want to make that claim.

(From experience with >300 CXX-based libraries at Meta, the bindings tend to consist overwhelmingly of safe functions, like this.

Yes, any API using pure value semantics in C++ (to the degree C++ can express it) will always be like that. It's great that so much of Meta's C++ code is written that way.

This makes them accessible to engineers with limited Rust experience. We've had engineers in their first and second week with Rust successfully write bindings that would have been challenging for me to produce without CXX as a Rust expert and C++ "knowledgeable".)

Not AT ALL questioning the usefulness of this work. I think it's absolutely fantastic. However, you do yourself a disservice by not being precise about the safety claims. It would be easy, for example, for one of those beginning engineers to read your README and reach the wrong conclusions about which things are safe. Also, someone more experienced like me but with less patience will look at your claim, realize it's impossible as stated, and dismiss CXX prematurely.

Of course some of the Rust code (the code that declares certain things to be safe at the very least!) needs to be vetted. It would be better to make the claims more accurate, and it would be helpful to characterize the properties of a C++ API that can be declared safe in Rust. FWIW.