Binding Unmodified Rust

fzyzcjy / flutter_rust_bridge

Flutter/Dart <-> Rust binding generator, feature-rich, but seamless and simple.

https://fzyzcjy.github.io/flutter_rust_bridge/

MIT License

4.22k stars 290 forks source link

Binding Unmodified Rust #1132

Closed coder0xff closed 10 months ago

coder0xff commented 1 year ago

Moving the discussion from #1116 as @fzyzcjy suggested. Here's a copy of the proposal:

Dart users don't expect to deal with ownership, lifetimes, and indirection. On the other hand, Rust exposes all of these for function arguments and returns. It's also nonsensical for Dart to borrow (& and &mut) a non-'static Rust object because the borrow checker doesn't work across language boundaries. (It is sensible for Rust to borrow a Rust value owned by Dart.) The Dart representation of a Rust object should encapsulate as much of the Rust memory model as possible to simulate the Dart memory model. For aspects that can't or shouldn't be encapsulated, we can leverage Dart constructs like implements to abstract them.

Arguments to Rust Functions

In Rust, we can't call a function with an argument that will take ownership if the caller does not have ownership. This is one example of the Rust memory model that can't be hidden from Dart. Here's the complete list:

No derivation of ownership from indirection
No derivation of mut indirection from const indirection
No derivation of exclusive ownership from shared ownership, or vice-a-versa
No derivation of Rc from Arc, or vice-a-versa

It's also worth pointing out some of the derivations that can and should be hidden from Dart:

Derivation of indirection from ownership
Derivation of const indirection from mut indirection
Derivation of mut double indirection upon mut or const indirection
Derivation of const double indirection upon const indirection

To abuse Worth Syntax Notation, I define five equivalence classes:

OWNED   = Box<OWNED>
        | T

MUT_IND = *mut T
        | &mut T
        | *mut MUT_IND
        | &mut MUT_IND

IND     = *const T
        | &T
        | *const IND
        | &IND
        | *mut IND
        | &mut IND

RC      = Rc<T>

ARC     = Arc<T>

T is any valid Rust type name. Any type constructor that can be derived from the same production is in the same equivalence class. For example, &mut *const T and &T can be derived from IND and are in the same equivalence class. Type constructors in the same equivalence class are interchangeable to pass an argument to a function. If we have a &T, we can invoke a function that wants a &mut *const T and vice versa. Generally, it's not possible to move across equivalence classes, except in the following scenarios, which I've dubbed substitutions:

OWNED   = IND
        | MUT_IND

RC      = IND
        | MUT_IND

ARC     = IND
        | MUT_IND

MUT_IND = IND

To explain by way of example, Box<T> is in the OWNED class, and we can compute &T in the IND class from it. Therefore, OWNED can be substituted for IND. (By the way, I spent some time investigating OWNED = RC and OWNED = ARC, but there's no way to make fake reference counters without causing memory leaks.)

If T implements Clone, then these additional substitutions are possible:

IND     = OWNED
MUT_IND = OWNED
RC      = OWNED
ARC     = OWNED

These can probably be piled into the "unexpected behavior" that @Desdaemon mentioned. Dart users expect pass by reference, not pass by value. These types should instead expose a clone method in Dart. I must point out that for values that can be serialized, the semantics of flutter_rust_bridge today is to pass them by value. We'll have to think about whether that should change.

Here's a more concrete example of the equivalence classes:

dump_file version 1

/// dump a File, taking ownership and destroying the File when done
pub fn dump_file(file: std::fs::File) {
    println!("{}", file.read_to_string());
}

dump_file version 2

/// dump a Box<File>, taking ownership and destroying the File when done
pub fn dump_file(file: Box<std::fs::File>) {
    println!("{}", file.read_to_string());
}

Dart needn't care about whether the argument is Boxed. All it needs to know is it owns a File, it will lose ownership when it calls dump_file, and the interop layer will take care of the rest. (Maybe !Sized: T needs to be its own class too) One final wrinkle: what if the Dart user wants to interact directly with pointers or reference counters? I said that T above is a type name. The good news is that we could tuck a pointer or ref counter into that T, and the rules still work, so long as the "no derivation of mut indirection from const indirection" rule isn't violated. If this capability is added, it would be follow-up work.

Equivalence Class Abstraction

In the same way FrbOpaque is inherited to make Rust-type-specific instantiations, five new Dart classes will be introduced that will be inherited to make Rust-type-specific instantiations. (Possible ten if we want "NoSend" variants for T: !Send. (12 for !Sized, maybe.))

Dart Base Type	Dart Derived Type Generated for Rust Type `MyType`	Notes
FrbInd	FrbIndMyType	`fn f(&self, ...)` method bindings
FrbMutInd	FrbMutIndMyType	`implements FrbIndMyType`, `fn f(&mut self, ...)` method bindings
FrbOwned	FrbOwnedMyType	`implements FrbMutIndMyType`, `fn f(self, ...)` method bindings
FrbRc	FrbRcMyType	`implements FrbMutIndMyType`
FrbArc	FrbArcMyType	`implements FrbMutIndMyType`

As in the RustOpaque example, the var keyword keeps the type name from the user. (I am concerned that the different collections of method bindings (see Notes above) could confuse the user if the type name is hidden. "Why can't I use this method?" Maybe we can abuse Darts @Deprecated annotation to notify the user?) The implements relations are modeled after the aforementioned substitution rules. They enable Dart bindings of Rust functions to parameterize with the least derived type but be called with the more derived type due to Liskov Substitution Principle (LSP). For example:

void do_a_thing(FrbIndirectionMyType value) { /* send to Rust */ }
FrbOwnedMyType make_a_value() { /* get from Rust */ }

var value = make_a_value();
do_a_thing(value); // OK, because FrbOwnedMyType implements FrbIndMyType

It's also theoretically possible for a binding to return, for example, a FrbOwnedMyType, despite having a declared return type of FrbIndirectionMyType due to the LSP, but that's probably not useful in practice and diverges from the Rust memory model anyway.

Return Values from Rust Functions

The equivalence classes are helpful here as well, with a tweak. Dart cannot generally borrow, except for &'static and &'static mut, so replace all occurrences of & with &'static. To send a return value to Dart, in order:

Convert to the first variant of the equivalence class. E.g., &'static mut T becomes *mut T, and T becomes Box<T>.
Serialize and transfer to Dart, including ownership, based on the type constructor:
- *const T - serialize into FrbInd
- *mut T - serialize into FrbMutInd
- Box - Box::into_raw and serialize into FrbOwned
- Arc - If Arc::try_unwrap works then change to Box, otherwise mem::forget, serialize *const Arc into FrbArc (like FrbOpaque)
- Rc - If Rc::try_unwrap works then change to Box, otherwise mem::forget, serialize *const Rc + thread::ThreadID into FrbRc

Again, if T implements Clone, it's feasible to work with borrowed return values, but the same pass-by-value concerns apply.

Rust-Function Valued Returns from Rust Functions

The feasibility of this was shown in https://github.com/fzyzcjy/flutter_rust_bridge/discussions/986. On the Dart side, it'll be an object that implements invocation, forwarding arguments to the Rust function.

Dart-Function Valued Arguments to Rust Functions

Similar conceptually to Rust callback functions but requires a bit more.

pub fn callback_caller(callback: Fn(String) -> i64) {
    assert(callback("1337") == 1337);
}

int dartCallback(String str) {
  return int.tryParse(str);
}

This can work similarly to StreamSink with the addition of multiplexing. On the Rust side, the Fn is made from a closure. If this is done right, we can even let the user hang themselves with callback hell. xD

Marshaling

Marshaling exposes the members of an unserializable Rust object to Dart. For example, a Rust std::fs::File object cannot be serialized. Nonetheless, a Dart class like the following is possible:

class StdFsFile {
  final Platform _platform;
  final RustIndirection _native;

  int write(UInt8List buf) {
    _platform.inner.std_fs_file_write(_native, buf);
  }
  ...  
}

It's Getting Late

This is pretty long already, so at the very least, it's time for a first round of discussion, especially since I'm still new to Rust and flutter_rust_bridge (how many months can I keep saying that?). Please let me know what you think, if I need to add some details, if something is wrong, etc. Cheers!

coder0xff commented 1 year ago

@fzyzcjy said

The proposal is exciting - especially, I love how we can call Dart callbacks from Rust and vise versa! IMHO this is quite useful, because closure (or "function as first-class citizen") is a commonly used language feature for modern languages.

To be honest, I am a little bit confused why we need FrbInd/FrbMutInd/..., etc. We know frb currently passes (copy or move) everything by value (except for Opaque) between Dart and Rust. Therefore, for example, no matter what Rust semantics say (e.g. owned, indirection, ...), we still create a brand new object (or move - if you are using ZeroCopyBuffer) in the Dart side. Thus, I am not very sure the use of FrbInd/FrbMutInd/...

They're for unserializable objects - like what's done with RustOpaque. The motivation is that we don't want:

pub fn my_function(foo: RustOpaque<Bar>)

because that's a Rust function written for flutter_rust_bridge. We want

pub fn my_function(foo: Bar)

a generalized Rust function that can be called from Dart. We need different varieties because of the (at least) 5 equivalence classes. Also take a look where I said, "Dart users expect pass by reference, not pass by value."

fzyzcjy commented 1 year ago

I see. So if I understand correctly, those FrbInd/FrbMutInd/etc are there, because we want to avoid writing RustOpaque in api.rs? Then still a bit confused: seems that we do not need FrbInd/... to avoid RustOpaque. Instead, just do some more code generation at bridge_generated.rs seems enough.

coder0xff commented 1 year ago

Say we have this Rust

struct Unserializable { ... }
pub fn put_owned(x: Unserializable) { ... }
pub fn get_owned() -> Unserializable { ... }
pub fn put_arc(x: Arc<Unserializable>) { ... }
pub fn get_arc() -> Arc<Unserializable> { ... }

These functions aren't interchangeable.

set_owned(get_owned()); // OK
set_arc(get_arc());     // OK
put_owned(get_arc());   // Error, can't pass Arc as owned
put_arc(get_owned());   // Error, can't pass owned as Arc

Because we want to hide Arc from Dart, we can imagine our Dart bindings like this:

class Unserializable { ... }
void put_owned(Unserializable x) { ... }
Unserializable get_owned() { ... }
void put_arc(Unserializable x) { ... }
Unserializable get_arc() { ... }

Here's the problem:

set_owned(get_owned()); // OK
set_arc(get_arc());     // OK
put_owned(get_arc());   // Invalid, but no Dart compiler error
put_arc(get_owned());   // Invalid, but no Dart compiler error

We can get Dart to produce compile errors for the invalid invocations if the binding looks like this instead:

class FrbOwnedUnserializable { ... }
class FrbArcUnserializable { ... }
void put_owned(FrbOwnedUnserializable x) { ... }
FrbOwnedUnserializable get_owned() { ... }
void put_arc(FrbArcUnserializable x) { ... }
FrbArcUnserializable get_arc() { ... }

And then the same Dart test code:

put_owned(get_owned()); // OK
put_arc(get_arc());     // OK
put_owned(get_arc());   // Dart compilation fails successfully
put_arc(get_owned());   // Dart compilation fails successfully

The Dart type system simulates the Rust memory model. To explain the use of the Liskov Substitution Principle, let's start with some new Rust code:

struct Unserializable { ... }
pub fn put_owned(x: Unserializable) { ... }
pub fn get_owned() -> Unserializable { ... }
pub fn put_ref(x: &Unserializable) { ... }
pub fn get_ref() -> &'static Unserializable { ... }

put_owned(get_owned()); // OK
put_ref(get_ref());     // OK
put_owned(get_ref());   // Error, can't pass ref as owned
put_ref(&get_owned());  // OK, we can get a ref from an owned

So in Dart we write

class FrbIndUnserializable { ... }
class FrbOwnedUnserializable implements FrbIndUnserializable { ... }
void put_owned(FrbOwnedUnserializable x) { ... }
FrbOwnedUnserializable get_owned() { ... }
void put_ref(FrbIndUnserializable x) { ... }
FrbIndUnserializable get_ref() { ... }

and calling in Dart

put_owned(get_owned()); // OK
put_ref(get_ref());     // OK
put_owned(get_ref());   // Dart compilation fails successfully
put_ref(get_owned());   // OK, because FrbOwnedUnserializable `implements` FrbIndUnserializable

fzyzcjy commented 1 year ago

I see - these types can generate compiler errors in Dart for bad types. Interesting type "hack"s!

However, I am not sure, how can we pass (move) a owned non-serializable object? Currently, we only support Arc (via Opaque).

coder0xff commented 1 year ago

These types are similar to but distinct from Opaque. For Onwed on the Rust side, some pseudocode:


fn make_frb_owned(value: Box<T>, channel: etc) {
    // Unbox value, keeping it on the heap and manually managing lifetime
    let ptr: *const T = Box::into_raw(value);
    // Send the pointer to Dart
    channel.serialize_and_transmit(ptr);
}

fzyzcjy commented 1 year ago

I see - the owned value can be created by leaking a raw pointer.

Then, I wonder how we should use the owned value? My guess is like (pseudocode):

void useOwnedValue(FrbOwnedUnserializable x) {
  if (x.movedAway) throw Exception;
  x.movedAway = true;
  call_rust_code_via_ffi(x.ptr);
}

class FrbOwnedUnserializable {
  final int ptr;
  bool movedAway = false;
}

Note that my naive method above has a problem: If the call via ffi somehow fails, there is memory leak. But it is at least better than use-after-free. IIRC, I have seen such dilemma in Opaque's implementation (not remember exactly though).

fzyzcjy commented 1 year ago

Also cc @Desdaemon @ShaunPettiN who discussed in the previous discussions (in case this new issue does not appear on your inbox)

coder0xff commented 1 year ago

Yeah, I had suspected that RustOpaque could leak too, because it owns an Arc. There are ways to clean up such objects, but that might be a separate feature.

Anyway to answer your question, your example code is how it works, yes.

fzyzcjy commented 1 year ago

I see. So this proposal is quite interesting! If I understand correctly, it can be broken into a few (mutually related) parts:

Allow Dart/Rust closures to be used in Rust/Dart respectively
Generating method and field accessors for the Dart types
Implement owned (i.e. non-Arc) Rust values
Create class hierarchy in Dart to simulate Rust's type system, which is like a generalization and enhancement of the FrbOpaque

It would be great to separate the implementation into at least a few PRs (e.g. the ones above), because the larger the PR is, the harder to implement and review ;)

As a side remark: For owned values, we should be extra careful, e.g. need to be Send if we cannot guarantee it is on the same thread

fzyzcjy commented 1 year ago

IMHO the proposal looks great to me (except that the Dart type system looks a bit complicated - would be even greater to have a simpler type system or other ways). But before getting started, maybe we should hear more voices: @Desdaemon what do you think about it?

coder0xff commented 1 year ago

Send is a concern, which is why I mention possibly needing even more Dart types in the proposal, but thinking about it now I don't think that's the solution. Rc isn't send either, and the way I suggested dealing with that was serializing the thread ID alongside the pointer. The thread ID can hopefully help with Send and Sync, but I haven't hashed out the details.

fzyzcjy commented 1 year ago

IIRC the existing Opaque already has some things about Send and thread-id like that.

Also cc @rogurotus who implemented the great opaque type, and @Desdaemon who implemented the initial parts of the great opaque system :)

coder0xff commented 1 year ago

Another part in addition to the three you mentioned is generating method and field accessors for the Dart types. See the Notes column in the original table

fzyzcjy commented 1 year ago

Edited :)

Desdaemon commented 1 year ago

I just have one question: how should we accommodate for the aliasing rules? Since if I understand this correctly FrbMutInd types implement FrbInd, meaning Dart users are allowed to pass a &mut T everywhere a &T is expected, without the borrow checker to guide them. This wouldn't be an issue if our design is similar to how RefCell does it, i.e. panic on aliases, but regardless of our approach it is imperative we do not introduce possibilities for UB on the FFI border.

On a separate note, Dart consumers of these functions will be forced to pay attention to limitations introduced by the borrow checker and that creates some friction, as did my attempt to enable methods taking self. But that's a trade-off the library authors will have to make, not us.

Lastly, I think *const T and *mut T are still pretty half-baked in Rust and I wouldn't count them in the same indirection classes you have mentioned. I feel that they have a closer relationship to ffi.Pointer than a reference, and that's how we should interpret them.

fzyzcjy commented 1 year ago

I agree. Making raw T/&T/&mut T will introduce a lot of difficulties and edge cases - that's partially why our current Opaque uses a naive Arc. The performance is definitely a bit worse, but given that our generated code + Dart's indirection code already have performance penalty, I guess Arc vs raw pointers will not be a big difference. So, maybe we can firstly implement the parts unrelated to these T/&T/&mutT.?

Desdaemon commented 1 year ago

I agree. Making raw T/&T/&mut T will introduce a lot of difficulties and edge cases - that's partially why our current Opaque uses a naive Arc. The performance is definitely a bit worse, but given that our generated code + Dart's indirection code already have performance penalty, I guess Arc vs raw pointers will not be a big difference. So, maybe we can firstly implement the parts unrelated to these T/&T/&mutT.?

No, I don't have any issues with the proposal itself, it's just that we're moving into fundamental Rust types that are unrepresentable in the Dart/C world, all that means is that some extra scrutiny is required. Yes, the promise is to write straightforward Rust code, but it does mean it won't be as straightforward in other languages that don't share Rust's vocabulary.

fzyzcjy commented 1 year ago

To be more specific, I am not against the proposal; instead, I am just pointing out some possible challenges of the T/&T/&mutT part, while it is still quite good!

Btw some random thoughts: Even if Arc vs raw pointer does not make a performance difference, the T/&T/&mutT may be still quite useful, because it may enable future optimization when the other overhead has been eliminated. In addition, without benchmarks we do not know yet whether Arc vs raw T/&T/&mutT is much slower or non-visibily slower.

coder0xff commented 1 year ago

@Desdaemon, Rust type conversions handle the type aliasing rules, which I've neglected in the proposal so far. The best way to explain and verify feasibility was to... well, just write it - the below is self-contained, working code for ease of evaluation. There's still some work to be done, but I think it's getting there.

pub trait DartSafe: std::panic::UnwindSafe + std::panic::RefUnwindSafe {}
impl<T: std::panic::UnwindSafe + std::panic::RefUnwindSafe> DartSafe for T {}

/// Sometimes needed to prevent error[E0117}:only traits defined in the current
/// crate can be implemented for arbitrary types
trait Alias<T>: Sized {
    fn alias(self) -> T;
}

type RustInd<T> = *const T;
/// When receiving this, you must take ownership
type RustOwned<T> = *mut T;
/// You must not take ownership from this pointer
type RustMutIndOrOwned<T> = *mut T;
type RustRc<T> = RustOwned<(std::rc::Rc<T>, std::thread::ThreadId)>;
// type RustMutRc<T: ?Sized + DartSafe> = RustRc<std::cell::RefCell<T>>;
type RustArc<T> = RustOwned<std::sync::Arc<T>>;
// type RustMutArc<T: ?Sized + DartSafe> = RustArc<std::cell::RefCell<T>>;

// FrbInd -> &'static T
impl<T: ?Sized + DartSafe> Alias<&'static T> for RustInd<T> {
    fn alias(self) -> &'static T {
        unsafe { &*self as &T }
    }
}

// FrbInd -> *const T 
impl<T: ?Sized + DartSafe> Alias<*const T> for RustInd<T> {
    fn alias(self) -> *const T {
        self
    }
}

/// FrbMutInd -> &'static T
/// FrbOwned -> &'static T
impl<T: ?Sized + DartSafe> Alias<&'static T> for RustMutIndOrOwned<T> {
    fn alias(self) -> &'static T {
        unsafe { &*self as &T }
    }
}

/// FrbMutInd -> *const T
/// FrbOwned -> *const T
impl<T: ?Sized + DartSafe> Alias<*const T> for RustMutIndOrOwned<T> {
    fn alias(self) -> *const T {
        self as *const T
    }
}

/// FrbMutInd -> &'static mut T
/// FrbOwned -> &'static mut T
impl<T: ?Sized + DartSafe> Alias<&'static mut T> for RustMutIndOrOwned<T> {
    fn alias(self) -> &'static mut T {
        unsafe { &mut *self as &mut T }
    }
}

/// FrbMutInd -> *mut T
/// FrbOwned -> *mut T
impl<T: ?Sized + DartSafe> Alias<*mut T> for RustMutIndOrOwned<T> {
    fn alias(self) -> *mut T {
        self
    }
}

/// FrbOwned -> Box<T>  (ownership transferred)
impl<T: ?Sized + DartSafe> Alias<Box<T>> for RustOwned<T> {
    fn alias(self) -> Box<T> {
        unsafe { Box::from_raw(self) }
    }
}

/// FrbOwned -> T  (ownership transferred)
impl<T: Sized + DartSafe> Alias<T> for RustOwned<T> {
    fn alias(self) -> T {
        *unsafe { Box::from_raw(self) }
    }
}

trait ThreadCheck {
    fn thread_affinity(&self) -> std::thread::ThreadId;

    fn check(&self) {
        let this_thread_id = std::thread::current().id();
        if self.thread_affinity() != this_thread_id {
            panic!("wrong thread!");
        }
    }
}

impl<T: ?Sized + DartSafe> ThreadCheck for RustRc<T> {
    fn thread_affinity(&self) -> std::thread::ThreadId {
        unsafe { (*(*self)).1 }
    }
}

/// FrbOwned::into_rc, the Rc takes ownership
impl<T: ?Sized + DartSafe> Alias<RustRc<T>> for RustOwned<T> {
    fn alias(self) -> RustRc<T> {
        Box::into_raw(Box::new((
            std::rc::Rc::<T>::from(unsafe { Box::from_raw(self) }),
            std::thread::current().id(),
        )))
    }
}

/// FrbRc<T> -> &T
impl<'a, T: ?Sized + DartSafe> Alias<&'a T> for RustRc<T> {
    fn alias(self) -> &'a T {
        // assert!(std::thread::current()::thread_id() == self.1)
        unsafe { (*self).0.as_ref() }
    }
}

/// FrbRc<T> -> *const T
impl<T: ?Sized + DartSafe> Alias<*const T> for RustRc<T> {
    fn alias(self) -> *const T {
        // assert!(std::thread::current()::thread_id() == self.1)
        unsafe { (*self).0.as_ref() as *const T }
    }
}

// /// FrbRc<RefCell<T>> -> &mut T
// impl<'a, T: ?Sized + DartSafe> Alias<&'a mut T> for &'a RustMutRc<T> {
//     fn alias(self) -> &'a mut T {
//         Alias::<&'a std::cell::RefCell<T>>::alias(self).get_mut()
//     }
// }

// /// FrbRc<RefCell<T>> -> *mut T
// impl<T: ?Sized + DartSafe> Alias<*mut T> for &RustMutRc<T> {
//     fn alias(self) -> *mut T {
//         Alias::<&std::cell::RefCell<T>>::alias(&self).get_mut() as *mut T
//     }
// }

// FrbOwned::into_arc, the Arc takes ownership
impl<T: ?Sized + DartSafe> Alias<RustArc<T>> for RustOwned<T> {
    fn alias(self) -> RustArc<T> {
        Box::into_raw(Box::new(std::sync::Arc::<T>::from(unsafe { Box::from_raw(self) })))
    }
}

/// FrbRc<T> -> &T
impl<'a, T: ?Sized + DartSafe> Alias<&'a T> for RustArc<T> {
    fn alias(self) -> &'a T {
        // assert!(std::thread::current()::thread_id() == self.1)
        unsafe { (*self).as_ref() }
    }
}

/// FrbRc<T> -> *const T
impl<T: ?Sized + DartSafe> Alias<*const T> for RustArc<T> {
    fn alias(self) -> *const T {
        // assert!(std::thread::current()::thread_id() == self.1)
        unsafe { (*self).as_ref() as *const T }
    }
}

// /// FrbRc<RefCell<T>> -> &mut T
// impl<'a, T: ?Sized + DartSafe> Alias<&'a mut T> for &'a RustMutRc<T> {
//     fn alias(self) -> &'a mut T {
//         Alias::<&'a std::cell::RefCell<T>>::alias(self).get_mut()
//     }
// }

// /// FrbRc<RefCell<T>> -> *mut T
// impl<T: ?Sized + DartSafe> Alias<*mut T> for &RustMutRc<T> {
//     fn alias(self) -> *mut T {
//         Alias::<&std::cell::RefCell<T>>::alias(&self).get_mut() as *mut T
//     }
// }

fn add_and_print(a: &i32, b: &i32) {
    println!("{}", a + b);
}

fn main() {
    let mut owned = 1 as i32;
    let mutable = &mut owned;
    let immutable = mutable as &i32;

    let frb_owned = Box::into_raw(Box::new(2 as i32));
    let mut mutable2 = 4 as i32;
    let frb_mut_ind = &mut mutable2 as *mut i32;
    let frb_ind = immutable as *const i32;
    let rc: std::rc::Rc<i32> = std::rc::Rc::<i32>::new(8);
    let frb_rc = Box::into_raw(Box::new((rc, std::thread::current().id())));
    let arc = std::sync::Arc::<i32>::new(16);
    let frb_arc = Box::into_raw(Box::new(arc));

    // f takes two &i32 args, but lets call it with a *bunch* of things
    add_and_print(frb_owned.alias(), frb_mut_ind.alias());
    add_and_print(frb_ind.alias(), frb_rc.alias());
    add_and_print(frb_arc.alias(), frb_owned.alias());
}

Desdaemon commented 1 year ago

Suppose we have this Rust function:

pub fn inc_and_print(left: &mut i32, right: &i32) {
    left += 1;
    println!("{left} != {right}");
}

Which this proposal would translate into Dart as:

void incAndPrint(FrbMutIndI32 left, FrbInd32 right);

Then a Dart consumer of this library may use it as such:

FrbMutIndI32 mutableValue; // obtained elsewhere
// aliasing rules violated:
// mutableValue is borrowed at the same time mutably and immutably
// Dart doesn't complain since FrbMutInd is covariant with FrbInd
incAndPrint(mutableValue, mutableValue);

If our Rust-side FrbInd were instead type aliases to e.g. RefCell, then attempts to .borrow or .borrow_mut would panic on the FFI border, which is totally fine since we should have mechanisms to catch panics as early as then. This is not the only approach, but the easiest one to demonstrate for now.

My point is, without any of Rust's guardrails in Dart, interior mutability is an acceptable compromise to prevent rustc miscompiling due to aggressive optimizations.

coder0xff commented 1 year ago

Thank you for bringing that up. We're looking at where the rubber meets pavement now, and there are difficult but essential philosophical questions about the interactions between memory models.

When first writing the proposal, my instinct was that any function returning a & or &mut could not be bound to Dart because there's no way to rule out access after free. For that reason, I also thought &'static and &'static mut were acceptable - I hadn't considered the aliasing and mutability rule yet. Also, without thinking about it enough, I assumed that *mut and *const let us throw all the considerations about lifetime, aliasing, and mutability out the window, and a pointer was a "get out of jail for a price" card. Honestly, I still don't know the right way to think about Rust pointers, though from @Desdaemon's comment about them being half-baked, maybe nobody does. The reality is that if we are to ensure that no Rust invariants are broken by Dart, including erring on the side of caution for pointers, then Dart can interact with a Rust value in only one way: ownership. That's what RefCell requires. It simplifies things a lot because RustOpaque already does this. We can go forward with the call-back functions and the generation of method and field accessors for RustOpaque, and that would be of great benefit. But IMHO, that would be underwhelming. This may be putting the cart (pointers) before the horse (refs), but examing the invariants behind pointers seems like the easiest way to broach the subject of undefined behavior (UB) between languages and the implications of how we map invariants from one language onto another.

I'm going to write several programs below, combining Rust and C. Each will have the same intended behavior, setting a byte with the dereference of two simultaneous mutable indirections. I'll change just one aspect at a time, and each subsequent program will be more offensive to Rust sensibilities.

Pure C

This program's behavior is well-defined.

#include <stdio.h>

int main() {
    char value = 0;
    char* ptr_a = &value;
    char* ptr_b = &value;
    *ptr_a = 'a';
    *ptr_b = 'b';
    printf("%c\n", *ptr_a); // prints b
}

Two `uint8_t*` to a `u8`

This time the byte is a Rust value, not a C value. We make two mutable indirections to a Rust value, but only in C and not escaping to Rust.

#include <stdio.h>

void algorithm(char* ptr_a) {
    char* ptr_b = ptr_a; // double mutable indirection to Rust value
    *ptr_a = 'a';
    *ptr_b = 'b';
    printf("%c\n", *ptr_a); // prints b
}

extern "C" {
    fn algorithm(ptr_a: *mut u8);
}

pub fn main() {
    let mut value = 0 as u8;
    unsafe { algorithm(&mut value as *mut u8) };
}

Two `uint8_t*` to a `u8`, Dereferenced in Rust

Now Rust will do the dereferencing.

#include <stdio.h>

extern void put(char* ptr, char);

void algorithm(char* ptr_a) {
    char* ptr_b = ptr_a; // double mutable indirection
    put(ptr_a, 'a');
    put(ptr_b, 'b');
    printf("%c\n", *ptr_a); // prints b
}

extern "C" { fn algorithm(ptr_a: *mut u8); }

#[no_mangle]
unsafe extern "C" fn put(ptr: *mut u8, value: u8) {
    *ptr = value;
}

pub fn main() {
    let mut value = 0 as u8;
    unsafe { algorithm(&mut value as *mut u8) };
}

Two `uint8_t*` to a `u8`, Dereferenced Simultaneously in Rust

This version makes Rust simultaneously hold two mutable references to a single location. This places us in the vicinity of the incAndPrint example.

#include <stdio.h>

extern void put(char* ptr_a, char value_a, char* ptr_b, char value_b);

void algorithm(char* ptr_a) {
    char* ptr_b = ptr_a; // double mutable indirection
    put(ptr_a, 'a', ptr_b, 'b');
    printf("%c\n", *ptr_a); // hard to know
}

extern "C" { fn algorithm(ptr_a: *mut u8); }

#[no_mangle]
unsafe extern "C" fn put(ptr_a: *mut u8, value_a: u8, ptr_b: *mut u8, value_b: u8) {
    let ref_a = &mut *ptr_a;
    let ref_b = &mut *ptr_b;
    *ref_a = value_a;
    *ref_b = value_b;
}

pub fn main() {
    let mut value = 0 as u8;
    unsafe { algorithm(&mut value as *mut u8) };
}

Pure Rust

And just for the sake of completion, however unsettling.

pub fn main() {
    unsafe {
        let mut value = 0 as u8;
        let ref_a = &mut *(&mut value as *mut u8);
        let ref_b = &mut *(&mut value as *mut u8);
        *ref_a = 5;
        *ref_b = 6;
        println!("{}", ref_a); // who can say
    }
}

We started out with well-defined behavior and ended with undefined behavior. Somewhere along the way we crossed over to the dark side. The first question is, which one? I can see a case being made for any one of them. The second question is, how does this impact us in practice? I tested all these programs, and they work, if incidentally.

In the second example, we create two mutable indirections. If the borrow checker is an algebra that applies to the whole program, then this example escapes that algebra. Yet, the second example will be a working program for any correct compiler. (I suspect three as well, but four could reorder.) Corollary: The borrow checker helps us avoid logical errors, but there are programs that are valid despite eluding the borrow checker. In such a case, it's undefined behavior only because the borrow checker can't prove that it's not.

Where does that leave us? Obviously, the borrow checker is a cornerstone of Rust and a big part of why Rust is so amazing. The borrow checker is responsible for keeping us from shooting ourselves in the foot. However, I would argue that, in the interest of interop, maybe the user can take on some of that responsibility. Maybe lifetimes too. That said, C is a different language from Dart, and we should consider that in deciding how much responsibility is the right amount.

coder0xff commented 1 year ago

After sleeping on it, we could make the default behavior ownership-based interaction only, and provide an escape hatch for users who are willing to take more responsibility. It would enable the generation of the other equivalence classes. Documentation needs to detail the risks of using the escape hatch. What would an escape hatch look like? It could be a toggle, but maybe that's too blunt an instrument. The CLI today accepts file-by-file source and destination information. Perhaps that can be extended with a "--unsafe-file". An even more granular option would be "--unsafe-type" or "--unsafe-function". I'm just spitballing, so please share your ideas.

Some other random thoughts:

Dart code like RustMutInd<MyType> also keeps coming to mind and it seems intriguing, but I'm not sure how it would work.
For Dart-owned values, the generated Dart code could call .dispose on the objects the user passes in for a move.
Dart-owned values could also transparently be promoted from T to Arc with another layer of abstraction, but it's also irreversable.

fzyzcjy commented 1 year ago

I think the way looks good - then we are keeping a simple and intuitive developer interface to the users (by default). Then it is easy to learn and use the library without the need to learn a lot. As for the escape hatch: Or we can utilize #[frb(...)] annotations? Or wrapper types just like how we did for ZeroCopyBuffer<...> and Opaque<...>.

coder0xff commented 1 year ago

For code that we own and write for dart consumption, I agree that annotations make sense. For third party libraries, command line options are unintrusive. But why choose? Let's do both.

coder0xff commented 1 year ago

Thank you @fzyzcjy. And thank you @Desdaemon. You guys have done great work for the community in putting this project together and getting it to where it is today. I appreciate you giving me the opportunity to work with you and contribute to enhancing it. Seriously, you've been really gracious and supportive.

fzyzcjy commented 1 year ago

You are welcome. Also thank you @coder0xff - you have proposed interesting new features, happy to discuss with you, and looking forward to the PRs!

coder0xff commented 1 year ago

Here's some Dart code. FrbDescriptor will be passed back and forth between Dart and Rust. When passed to Rust, Rust code will look at the usage field to deserialize to the correct type constructor, *const, Arc, etc. Type safety of the (implied) T works the same way as FrbOpaque, with derived classes and Rust functions specific to each T. After deserialization, the aliasing rules in https://github.com/fzyzcjy/flutter_rust_bridge/issues/1132#issuecomment-1474822609 perform argument forwarding to the target function.

/// Generalizes pointers to Rust objects. Treat it as a black box in Dart.
@ffi.Packed(1)
class FrbDescriptor extends ffi.Struct {
  // Rust fat pointer serialized as pointer and metadata. Implementation
  // details may change; treat as unstable black boxes.
  external PlatformPointer pointer;
  external PlatformPointer metadata;

  /// The pointers originating Rust thread, for asserting Send and Sync usage.
  @ffi.Uint64()
  external int threadId;

  /// 1 - pointer to borrowed
  /// 2 - pointer to mutably borrowed
  /// 3 - pointer to owned
  /// 4 - pointer to owned Rc
  /// 5 - pointer to owned Arc
  @ffi.Uint8()
  external int usage;

  /// The Rust type of the pointee implements Sync
  @ffi.Bool()
  external bool sync;

  /// The Rust type of the pointee implements Send
  @ffi.Bool()
  external bool send;
}

typedef DropOwnedFnType = void Function(FrbDescriptor);

abstract class FrbInd {
  FrbDescriptor? _accessor;

  FrbInd(this._accessor);
}

abstract class FrbMutInd extends FrbInd {
  FrbMutInd(FrbDescriptor _accessor) : super(_accessor);
}

abstract class FrbOwned extends FrbMutInd {
  FrbOwned(FrbDescriptor _accessor) : super(_accessor);

  /// A native finalizer rust opaque type.
  /// Is static for each frb api class instance.
  OpaqueTypeFinalizer get staticFinalizer;

  /// Rust type specific drop function.
  ///
  /// This function should never be called manually.
  DropOwnedFnType get dropOwnedFn;

  /// Call Rust destructors on the backing memory of this pointer.
  ///
  /// This function should be run at least once during the lifetime of the
  /// program, and can be run many times.
  ///
  /// When passed into a Rust function, Rust enacts *shared ownership*,
  /// if this pointer is shared with Rust when [dispose] is called,
  /// ownership is fully transferred to Rust else this pointer is cleared.
  void dispose() {
    if (_accessor != null) {
      var accessor = _accessor;
      _accessor = null;

      staticFinalizer.detach(this);
      dropOwnedFn(accessor!);
    }
  }

  FrbDescriptor consume() {
    if (_accessor == null) {
      throw FlutterRustBridgeValueMovedException();
    }
    var accessor = _accessor;
    _accessor = null;

    staticFinalizer.detach(this);
    return accessor!;
  }
}

typedef DropRcFnType = void Function(FrbDescriptor);

/// Like FrbOwned, except the Rust object is an Rc
abstract class FrbRc extends FrbMutInd {
  FrbRc(FrbDescriptor accessor) : super(accessor);

  /// A native finalizer rust opaque type.
  /// Is static for each frb api class instance.
  OpaqueTypeFinalizer get staticFinalizer;

  /// Rust type specific drop function.
  ///
  /// This function should never be called manually.
  DropRcFnType get dropRcFn;

  /// Call Rust destructors on the backing memory of this pointer.
  ///
  /// This function should be run at least once during the lifetime of the
  /// program, and can be run many times.
  ///
  /// When passed into a Rust function, Rust enacts *shared ownership*,
  /// if this pointer is shared with Rust when [dispose] is called,
  /// ownership is fully transferred to Rust else this pointer is cleared.
  void dispose() {
    if (_accessor != null) {
      var accessor = _accessor;
      _accessor = null;

      staticFinalizer.detach(this);
      dropRcFn(accessor!);
    }
  }
}

typedef DropArcFnType = void Function(FrbDescriptor);

/// Like FrbOwned, except the Rust object is an Arc
abstract class FrbArc extends FrbMutInd {
  FrbArc(FrbDescriptor accessor) : super(accessor);

  /// A native finalizer rust opaque type.
  /// Is static for each frb api class instance.
  OpaqueTypeFinalizer get staticFinalizer;

  /// Rust type specific drop function.
  ///
  /// This function should never be called manually.
  DropArcFnType get dropArcFn;

  /// Call Rust destructors on the backing memory of this pointer.
  ///
  /// This function should be run at least once during the lifetime of the
  /// program, and can be run many times.
  ///
  /// When passed into a Rust function, Rust enacts *shared ownership*,
  /// if this pointer is shared with Rust when [dispose] is called,
  /// ownership is fully transferred to Rust else this pointer is cleared.
  void dispose() {
    if (_accessor != null) {
      var accessor = _accessor;
      _accessor = null;

      staticFinalizer.detach(this);
      dropArcFn(accessor!);
    }
  }
}

Edited because I made an error

fzyzcjy commented 1 year ago

That's interesting!

Firstly, some uber-nits: things like external bool sync; can be known at compile time, so no need to be a real field.

As for the design, to be honest, I am quite busy recently so cannot fully load this interesting technical huge proposal into my mind and process it ;) So, it may be helpful to clearly state what the proposal is going to solve, probably with examples. For example, imagine "what are we going to tell the future users in the future releases?" I do remember something like "arbitrary pointers can be passed back and forth" and (exciting to me) arbitrary closures etc, but it would be even greater to have a bit more explicit statement. Anyway in the future we will have to write it down to tell users!

Btw I can see why Googlers in https://github.com/flutter/flutter/issues/101227 says it is not super easy to follow that long thread I proposed - I am now having similar feelings ;)

Desdaemon commented 1 year ago

How about something simpler? This proposal does give us the ability to differentiate between shared references and owned values (partly, because the existence of Cow and similar types complicate things.) So it is still fine to represent shared references as *const T or Option<NonNull<T>> where T: DartSafe and allocated on the heap, and since it's raw pointers Rust is more conservative and will not optimize out reads.

However, mutable references and owned types cannot afford concurrent modifications, and will be wrapped in a RwLock<&mut T> or some manner of Cell, and implicitly unwrapped by us, bubbling any runtime errors up as Dart exceptions. One can think of it as an invariant to be upheld by the Dart caller, like how Java has ConcurrentModificationException so everyone understands the limitations of iterating.

// left is understood to never point to right, and vice versa
pub fn do_stuff(left: &mut i32, right: &i32);

FrbMutInd value;
api.doStuff(left: value, right: value); // throws e.g. BorrowingException
FrbInd another;
api.doStuff(left: value, right: another); // no problem

coder0xff commented 1 year ago

@Desdaemon, good point, we can do borrow checking amongst arguments when invoking a Rust function from Dart. It would be helpful to identify the requirements of the encapsulation, which I named "descriptor", which is not very creative but also doesn't need to be. Rust values and indirections to values are encapsulated by a descriptor to

maintain state across language boundaries
provide an abstraction of Rust semantics while in Dart
and safely transform type constructors in Rust to facilitate function invocation

To that end, we require that a descriptor,

must be plain old data that can cross language boundaries (ffi.Struct)
must be able to reproduce exactly one of the following:
- an owned value
- an immutable indirection
- a mutable indirection
- an owned Rc
- an owned Arc
may not be directly instantiated, and is sub-typed for each T, providing all capabilities for the sub-types. (The subtypes may not extend the descriptor.)
when sub-typed, checks for violations of !Send and !Sync using a saved originating-thread ID.
has a value-equality operator (not of the value, but of the descriptor itself)
if representing an owned value, or a uniquely owned Rc or Arc, may pass the value to a callee that will consume the value, invalidating the descriptor.
if representing an (shared) owned value or mutable indirection, may, in the course of a single Rust function invocation, either
- produce many const references with lifetimes not exceeding the invocation, or
- produce one mutable reference with a lifetime not exceeding the invocation
if representing an immutable indirection, may produce many const references with a lifetime not exceeding the invocation.

Looking first at the following requirements:

must be plain old data that can cross language boundaries (ffi.Struct)
must be able to reproduce exactly one of the following:
- an owned value
- an immutable indirection
- a mutable indirection
- an owned Rc
- an owned Arc

I feel a tagged union is best suited, which is where the usage field comes into play.

Looking next at these requirements:

has a value-equality operator
if representing an (shared) owned value or mutable indirection, can either
- produce many const references with lifetimes not exceeding the invocation, or
- produce one mutable reference with a lifetime not exceeding the invocation

We can do something like this:

// Rust code

pub fn my_function(a: &mut i32, b: &mut i32) { ... }

// generated
pub fn my_function_wrapper(a: RustDescriptor, b: RustDescriptor) {
    my_function(a.into(), b.into());
}

// Dart code

// generated 
void my_function_binding(FrbDescriptor a, FrbDescriptor b) {
    assert a != b; // do borrow checking
    my_function_wrapper(a, b);
}

That is, we can statically generate borrow checking code as part of the bridge. That would have the benefit of raising in Dart before even reaching Rust. I don't know if this is simpler like you had hoped. But I also don't see how any of the requirements could be removed. Let me know if I'm overlooking something.

coder0xff commented 1 year ago

Personal update: Just moved house, bought new PC, have to RMA parts. I haven't forgotten and will resume work in the coming weeks. I'll leave this comment for a short time and then delete so as to not pollute the thread. ✌🏻

fzyzcjy commented 1 year ago

@coder0xff Take your time! It's OK to leave this comment and not delete it. Sorry to hear the chore housework (if I understand correctly) and hope you get a new good house and PC!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new issue.

fzyzcjy commented 1 year ago

(I should have unlocked it in addition to reopening it, not realized stale bot did the locking)

coder0xff commented 1 year ago

I won't be able to continue working on this, at least until my circumstances change. Thanks for your upkeep.

fzyzcjy commented 1 year ago

@coder0xff It's totally OK, no worries - Ideas, even though not become mature code, are themselves quite valuable, because it is interesting, can inspire future people for more ideas, real code, etc!

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 9 months ago

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new issue.

fzyzcjy / flutter_rust_bridge

Binding Unmodified Rust #1132

Arguments to Rust Functions

dump_file version 1

dump_file version 2

Equivalence Class Abstraction

Return Values from Rust Functions

Rust-Function Valued Returns from Rust Functions

Dart-Function Valued Arguments to Rust Functions

Marshaling

It's Getting Late

Pure C

Two uint8_t* to a u8

Two uint8_t* to a u8, Dereferenced in Rust

Two uint8_t* to a u8, Dereferenced Simultaneously in Rust

Pure Rust

Two `uint8_t*` to a `u8`

Two `uint8_t*` to a `u8`, Dereferenced in Rust

Two `uint8_t*` to a `u8`, Dereferenced Simultaneously in Rust