Open multimeric opened 1 year ago
@yutannihilation and @andy-thomason I'd love some input if you have time. This is been discussed a bit on discord and it seems like moving to a custom, faillible trait everywhere is the best option, but I want everyone's input before I implement this.
Thanks for inviting me. But, please ignore me on deciding the direction on this. I think I'm in a different direction and I don't want to disturb you. I think we should first research carefully on how other frameworks provide the ergonomics around conversion, but, considering Bevy chooses this way, it should be reasonable.
One downside I've identified of having a conversion trait is that it's difficult to write different implementations for e.g. an owned value, a reference, a mutable reference etc. With TryFrom you just say impl TryFrom<&T> for Robj
whereas with IntoRobj
some thought might be needed to work this out. One solution might be to define IntoRobj
as into_robj(&self) -> impl Borrow<Robj>
which is implemented for free for &Robj
, Robj
, Box<Robj>
etc.
I hope this helps:
Robj itself is a synonym for SEXP and has the same memory footprint (ie. a pointer).
Making a Robj is expensive, however, as we need to protect the data from the GC and so using &Robj is preferred when just accessing the data. We can do this with VECSXP, for example, where &Robj can refer to the SEXP in the list with no overhead.
All the Robj derived types are also synonyms of SEXP which makes for a lighter weight casting experience.
Note the &Robj doesn't by itself hold ownership of the underlying SEXP, but the lifetime rules of Rust should ensure that there is a Robj somewhere or at least a root object that is known to R.
I would favour deprecating FromRobj. Rust keeps promising trait specialisation which would
allow us to do blanket implementations of TryFrom<&T>
. TryFrom<T>
makes little sense as we can't
use Rust-allocated objects as components of R objects and moving the object will always make a copy.
@andy-thomason
Making a Robj is expensive, however, as we need to protect the data from the GC and so using &Robj is preferred when just accessing the data.
I thought cloning a Robj
was fine because it just cloned the pointer in Rust, without allocating more R memory?
I would favour deprecating FromRobj.
But FromRobj
is defined as taking a &robj
which is ideal right? If we make it return a Result<>
, what else is wrong with FromRobj
? Also, what do you think about the benefits of owning the trait so we can customize it? TryFrom
doesn't have these benefits
Rust keeps promising trait specialisation which would allow us to do blanket implementations of TryFrom<&T>. TryFrom
makes little sense as we can't use Rust-allocated objects as components of R objects and moving the object will always make a copy.
I'm confused. When we have a Rust-allocated T
, we have to make a copy into R memory almost always right? So we need to define conversions on T
. From there, yes we could get a blanket implementation over &T
via specialization, but also if we owned the trait we could define into_robj<S>(rust_object: S) -> Robj where S : Borrow<T>
to allow conversions from &T
and Box<T>
for free, without waiting for specialization.
I thought cloning a Robj was fine because it just cloned the pointer in Rust, without allocating more R memory?
It does clone the pointer, but it also keeps a copy of the SEXP in a vector which is visible to R. This way if R does a garbage collect, the SEXP is not freed.
The SEXPs are also reference counted in Extendr, so removing the last reference removes the SEXP from the list.
See https://github.com/extendr/extendr/blob/master/extendr-api/src/ownership.rs#L37-L40
The problem with R_PreserveObject
is that it is neither reference counted nor efficient. Although
our hashmap implementation is not ideal either.
Both TryFrom<&Robj>
and FromRobj
need to clone the Robj. The exception is TryFrom<Robj>
which
can 'steal' the underlying Robj without increasing the ref count. Either way, I think we are cloning
the Robj whenever a parameter comes from R. This way if we take ownership of the Robj in our
own data structure that persists after the call, the SEXP won't get freed by R.
You can test this by hooking the R destructor, dropping the Robj and observing the destruction of the SEXP.
The best way IMHO of creating an R array is to use an exact size iterator. The elements of the vector will be pre-allocated and copied to the R vector without an extra Rust vector allocation.
Caveat, though, exact size iterators can lie!
Wow okay I wasn't aware that we had our own protection mechanism. What is the advantage of this over just using R's native PROTECT mechanism? Also, if every Robj is protected by virtue of being in a list, does that mean that we never need to use PROTECT in extendr code? We had a recent issue relating to this: https://github.com/extendr/extendr/pull/540
Both TryFrom<&Robj> and FromRobj need to clone the Robj. The exception is TryFrom
which can 'steal' the underlying Robj without increasing the ref count. Either way, I think we are cloning the Robj whenever a parameter comes from R.
What are the implications for conversion then? Surely the Robj's lifetime is the R function call, and everything in extendr can just use references to it? If yes, isn't FromRobj
good because it encourages people to implement the conversion on a reference which prevents the need for cloning? Why would we want people implementing TryFrom<Robj>
at all?
And in the other direction I think another benefit of having our own IntoRobj
trait is that it provides the "one true path" for having your type converted to R:
&T
which minimizes copies&T
we can actually modify the #[extendr]
macro to accept any return value T where <T as Borrow> : IntoRobj
. This gives us conversions from owned, boxed, etc objects for free, and without the confusion we have when implementing all these conversions ourselvesWhat is the advantage of this over just using R's native PROTECT mechanism?
In my understanding, these two are not what you can compare. PROTECT
is to prevent the objects from getting accidentally GC-ed during the function call. The object must be UNPROTECT
-ed before the end of the function. If you want to keep protecting the objects, you have to get it referenced from another SEXP under the protection of R_PreserveObject()
. WRE and references in https://github.com/extendr/extendr/issues/468 might help.
does that mean that we never need to use PROTECT in extendr code?
In my understanding, no. You have to protect any SEXP during the function call.
Thanks @yutannihilation, I'll discuss this more in #468 because I feel we're getting off track.
Extendr is based around the core concept that
extendr
-annotated functions are passedRobj
from R, and need to returnRobj
to R. The abstraction comes from the fact that we allow the user to specify as arguments any type that can be converted from a Robj (FromRobj
), or return any type that can be converted to aRobj
(ToRobj
).Although everyone is agreed that the conversion traits we use to convert to and from
Robj
should be fallible (ie return aResult
), there has been some disagreement over whether we should use vanillaTryFrom
as the conversion trait as we do now when#[extendr(use_try_from = true)]
is set, or if we should continue to use the oldFromRobj
/IntoRobj
traits and just modify them to have a fallible method.From discussions mostly on discord, there seem to be 3 main advantages of using a custom trait (
FromRobj
/IntoRobj
):TryFrom
wouldn't allow thisAs I see it, the benefits of using
TryFrom
instead are:This is an important issue because it impacts the core design of extendr. In particular https://github.com/extendr/extendr/issues/336, which is generally agreed to be a requirement for our 1.0 release.
Any feedback is welcome on this topic.