dtolnay / cxx

Safe interop between Rust and C++
https://cxx.rs
Apache License 2.0
5.89k stars 334 forks source link

Passing std::string by value when calling Rust -> C++ #250

Closed adetaylor closed 4 years ago

adetaylor commented 4 years ago

cxx cannot currently cope with:

#[cxx::bridge]
mod ffi {
    extern "C" {
       fn HandleString(foo: CxxString) -> bool;
    }
}

It says:

error[cxxbridge]: passing C++ string by value is not supported

What do you think about adding support for this? We need this if we're to call many of our existing C++ APIs without having to write wrappers.

To me the best option would seem to be to implicitly turn a CxxString into a &CxxString when calling from Rust to C++, then construct a new std::string from it within the generated .cc code.

That seems straightforward (unless I'm missing some complexity, which I might be!). It does introduce a slightly greater level of marshalling/unmarshalling than we currently do. Perhaps you consider this a role for the higher-level code generator discussed in #228 and #239, but I'd probably claim that this is sufficiently useful it would be handy for all cxx consumers?

Further enhancements:

dtolnay commented 4 years ago

To me the best option would seem to be to implicitly turn a CxxString into a &CxxString when calling from Rust to C++, then construct a new std::string from it within the generated .cc code.

That seems straightforward (unless I'm missing some complexity, which I might be!).

The non-straightforward part is whether a std::string by value can even legally exist in Rust. If it can, it's straightforward to support, which would make string arguments and structs containing strings both work.

The problem is with hypothetical std::string implementations that lay out as follows to take advantage of not needing a branch for dereferences of the data pointer:

long strings             short strings
   +---+                     +---+
   |ptr to heap              |ptr to data
   +---+                     +---+
   |length                   |data
   +---+                     |   |
   |capacity                 |   |
   +---+                     +---+

A std::string by value in Rust could only be allowed in environments that do not do SSO or do SSO not this way.

Do you know if:

  1. A std::string implementation that has an internal pointer is allowed by the standard?
  2. This is ever a thing that real standard library implementations do?
  3. Your specific standard library does it?
adetaylor commented 4 years ago

Yes. So far as I know that is indeed something that our standard library does (and even if it didn't, I wouldn't want to guarantee that it wouldn't in future).

I suppose, however, I was thinking of a CxxString as being little more than a handle to a C++-side object which is allocated, freed and manipulated solely from C++ code (and every operation on the Rust type actually simply calls through to some C++ code). So, the Rust-side CxxString would secretly just be a pointer to a C++-side std::string. I can't see any other safe way to do it without becoming dependent on implementation details of the C++ standard library.

(Thinking out loud as you can tell!)

dtolnay commented 4 years ago

All the binding types are exactly what exists in the other language, they are not handles.

We used to use handles in my work codebase and it was a bad experience. It falls apart when you want to call a method that takes &'a HANDLE in one language but all you have is &'a ACTUAL in the other language; you end up not being able to come up with a handle with the appropriate lifetime, and need to start inventing mutually incompatible Foo and FooRef and FooMut versions of all types.

In cxx the equivalent of a handle to a std::string is UniquePtr<CxxString>, which is allowed to be passed by value.

adetaylor commented 4 years ago

OK. Yes, I know they're not handles right now, but if we assume it's impossible to represent std::string in Rust then I figured that a handle might be better than nothing.

But yes. Your point on lifetimes and experience using them is appreciated, and I was worried it might head in that direction too. I'll do some thinking. It's very desirable for us to be able to call existing C++ APIs from Rust even if they take a std::string by value.

(One solution I already discounted is to use Pin, since although it's designed for self-referential structs, the whole point is that I want to be able to pass std::strings by value.)

In cxx the equivalent of a handle to a std::string is UniquePtr<CxxString>, which is allowed to be passed by value.

Maybe the hypothetical higher-level code generator always generates a C++ wrapper function which takes a UniquePtr<CxxString> from Rust if the original C++ function took a std::string. This is roughly what I was originally driving at, but more explicit.

dtolnay commented 4 years ago

Maybe the hypothetical higher-level code generator always generates a C++ wrapper function which takes a UniquePtr<CxxString> from Rust if the original C++ function took a std::string.

That could work! A step further in that direction to make it seamless would be:

extern "C" {
    fn TakeString(s: CxxString);
}

// becomes callable as:
fn TakeString(s: impl IntoCxxString);

where we have impls to make it work for &str, String, &CxxString, UniquePtr\<CxxString>, etc. This would sort of imitate the implicit conversions or implicit copy construction you would get in C++ callers while still being able to move in UniquePtr\<CxxString> if the caller has one available.

TakeString("...");