extendr / extendr

R extension library for rust designed to be familiar to R users.
https://extendr.github.io
MIT License
435 stars 43 forks source link

Conversion trait architecture #498

Open multimeric opened 1 year ago

multimeric commented 1 year ago

Extendr is based around the core concept that extendr-annotated functions are passed Robj from R, and need to return Robj to R. The abstraction comes from the fact that we allow the user to specify as arguments any type that can be converted from a Robj (FromRobj), or return any type that can be converted to a Robj (ToRobj).

Although everyone is agreed that the conversion traits we use to convert to and from Robj should be fallible (ie return a Result), there has been some disagreement over whether we should use vanilla TryFrom as the conversion trait as we do now when #[extendr(use_try_from = true)] is set, or if we should continue to use the old FromRobj/IntoRobj traits and just modify them to have a fallible method.

From discussions mostly on discord, there seem to be 3 main advantages of using a custom trait (FromRobj/IntoRobj):

  1. We get to allocate one 1 blanket implementation to the conversion trait, whereas TryFrom wouldn't allow this
  2. A custom trait allows us to use custom associated types, although I can't think of a use case for this in extendr (input welcome)
  3. A custom trait would allow us to make the trait generic, with a "marker type" as described in https://github.com/extendr/extendr/issues/440#issuecomment-1448873888. This may help resolve the issue described in that thread, whereby third party crates would have trouble "extendrizing" existing types.

As I see it, the benefits of using TryFrom instead are:

This is an important issue because it impacts the core design of extendr. In particular https://github.com/extendr/extendr/issues/336, which is generally agreed to be a requirement for our 1.0 release.

Any feedback is welcome on this topic.

multimeric commented 1 year ago

@yutannihilation and @andy-thomason I'd love some input if you have time. This is been discussed a bit on discord and it seems like moving to a custom, faillible trait everywhere is the best option, but I want everyone's input before I implement this.

yutannihilation commented 1 year ago

Thanks for inviting me. But, please ignore me on deciding the direction on this. I think I'm in a different direction and I don't want to disturb you. I think we should first research carefully on how other frameworks provide the ergonomics around conversion, but, considering Bevy chooses this way, it should be reasonable.

multimeric commented 1 year ago

One downside I've identified of having a conversion trait is that it's difficult to write different implementations for e.g. an owned value, a reference, a mutable reference etc. With TryFrom you just say impl TryFrom<&T> for Robj whereas with IntoRobj some thought might be needed to work this out. One solution might be to define IntoRobj as into_robj(&self) -> impl Borrow<Robj> which is implemented for free for &Robj, Robj, Box<Robj> etc.

andy-thomason commented 1 year ago

I hope this helps:

Robj itself is a synonym for SEXP and has the same memory footprint (ie. a pointer).

Making a Robj is expensive, however, as we need to protect the data from the GC and so using &Robj is preferred when just accessing the data. We can do this with VECSXP, for example, where &Robj can refer to the SEXP in the list with no overhead.

All the Robj derived types are also synonyms of SEXP which makes for a lighter weight casting experience.

Note the &Robj doesn't by itself hold ownership of the underlying SEXP, but the lifetime rules of Rust should ensure that there is a Robj somewhere or at least a root object that is known to R.

I would favour deprecating FromRobj. Rust keeps promising trait specialisation which would allow us to do blanket implementations of TryFrom<&T>. TryFrom<T> makes little sense as we can't use Rust-allocated objects as components of R objects and moving the object will always make a copy.

multimeric commented 1 year ago

@andy-thomason

Making a Robj is expensive, however, as we need to protect the data from the GC and so using &Robj is preferred when just accessing the data.

I thought cloning a Robj was fine because it just cloned the pointer in Rust, without allocating more R memory?

I would favour deprecating FromRobj.

But FromRobj is defined as taking a &robj which is ideal right? If we make it return a Result<>, what else is wrong with FromRobj? Also, what do you think about the benefits of owning the trait so we can customize it? TryFrom doesn't have these benefits

Rust keeps promising trait specialisation which would allow us to do blanket implementations of TryFrom<&T>. TryFrom makes little sense as we can't use Rust-allocated objects as components of R objects and moving the object will always make a copy.

I'm confused. When we have a Rust-allocated T, we have to make a copy into R memory almost always right? So we need to define conversions on T. From there, yes we could get a blanket implementation over &T via specialization, but also if we owned the trait we could define into_robj<S>(rust_object: S) -> Robj where S : Borrow<T> to allow conversions from &T and Box<T> for free, without waiting for specialization.

andy-thomason commented 1 year ago

I thought cloning a Robj was fine because it just cloned the pointer in Rust, without allocating more R memory?

It does clone the pointer, but it also keeps a copy of the SEXP in a vector which is visible to R. This way if R does a garbage collect, the SEXP is not freed.

The SEXPs are also reference counted in Extendr, so removing the last reference removes the SEXP from the list.

See https://github.com/extendr/extendr/blob/master/extendr-api/src/ownership.rs#L37-L40

The problem with R_PreserveObject is that it is neither reference counted nor efficient. Although our hashmap implementation is not ideal either.

Both TryFrom<&Robj> and FromRobj need to clone the Robj. The exception is TryFrom<Robj> which can 'steal' the underlying Robj without increasing the ref count. Either way, I think we are cloning the Robj whenever a parameter comes from R. This way if we take ownership of the Robj in our own data structure that persists after the call, the SEXP won't get freed by R.

You can test this by hooking the R destructor, dropping the Robj and observing the destruction of the SEXP.

The best way IMHO of creating an R array is to use an exact size iterator. The elements of the vector will be pre-allocated and copied to the R vector without an extra Rust vector allocation.

Caveat, though, exact size iterators can lie!

multimeric commented 1 year ago

Wow okay I wasn't aware that we had our own protection mechanism. What is the advantage of this over just using R's native PROTECT mechanism? Also, if every Robj is protected by virtue of being in a list, does that mean that we never need to use PROTECT in extendr code? We had a recent issue relating to this: https://github.com/extendr/extendr/pull/540

Both TryFrom<&Robj> and FromRobj need to clone the Robj. The exception is TryFrom which can 'steal' the underlying Robj without increasing the ref count. Either way, I think we are cloning the Robj whenever a parameter comes from R.

What are the implications for conversion then? Surely the Robj's lifetime is the R function call, and everything in extendr can just use references to it? If yes, isn't FromRobj good because it encourages people to implement the conversion on a reference which prevents the need for cloning? Why would we want people implementing TryFrom<Robj> at all?

And in the other direction I think another benefit of having our own IntoRobj trait is that it provides the "one true path" for having your type converted to R:

yutannihilation commented 1 year ago

What is the advantage of this over just using R's native PROTECT mechanism?

In my understanding, these two are not what you can compare. PROTECT is to prevent the objects from getting accidentally GC-ed during the function call. The object must be UNPROTECT-ed before the end of the function. If you want to keep protecting the objects, you have to get it referenced from another SEXP under the protection of R_PreserveObject(). WRE and references in https://github.com/extendr/extendr/issues/468 might help.

does that mean that we never need to use PROTECT in extendr code?

In my understanding, no. You have to protect any SEXP during the function call.

multimeric commented 1 year ago

Thanks @yutannihilation, I'll discuss this more in #468 because I feel we're getting off track.