dgkf / R

An experimental reimagining of R
https://dgkf.github.io/R
GNU General Public License v3.0
135 stars 6 forks source link

Object shared references architecture #127

Open dgkf opened 4 months ago

dgkf commented 4 months ago

This is a meta-issue to cover a few different discussions on mutability patterns.

112 and https://github.com/dgkf/R/pull/123#discussion_r1558402942 have me feeling like now might be a good time to think more holistically about how objects are referenced and mutated. While getting this project off the ground, clones were used generously and it's time to be more thoughtful about minimizing those.

Overall, I think we're on the right path right now, so if you're already up to date on those discussions, this is probably more of a summary than any novel plan. I'll collect the ideas that are leading us to where we are:

Environments of shared objects

Environments should be a collection of Rc<Obj>, allowing multiple references to exist to the same data. A trivial example like:

y <- x + x + x + x + x

Should never copy the data in x.

mutable objects

Mutable objects should, by default, use Rc::make_mut, providing copy-on-write to objects whose mutability does not change the internal representation of the data.

In the following example:

x <- y <- c(1, 2, 3)
x[2:3][[1]] <- x[[1]] + x[[3]]

First, the right-hand-side is evaluated using x[[1]] and x[[3]], producing a result using only references to x. The left-hand-side needs to first be made into a mutable reference before producing a subset and updating the value using assignment.

I anticipate that the way subsetting operators work might need to change to accommodate this, perhaps Rc::make_mut needs to be called when x is first accessed, given that it is on the left-hand-side of the assignment.

mutable object representations

And finally, when an object has multiple representations, and its representation needs to change (for instance, materializing a range into a vector), these representations should be behind a RefCell<RepType<_>>

x <- 1:100
x[[3]] <- 10  # can no longer be represented as a range, materialized as vector
sebffischer commented 4 months ago

I think there is some interplay between the low-level Rc for the RepType::Subset(Rc<Vec<T>>, ...) and the Rc on the object-level (Rc<Obj>) that we need to pay attention to. The structure of the objects stored in the environment will look something like Rc<Obj::Vector(Vector::Double(RefCell<RepType::Subset(Rc<Double>, Subsets)>))> Consider the example below.

x <- [1, 2, 3]
y <- x
x[[1]] <- 99

then line 2 must not only increment the reference count of the outer Rc (Rc<Obj<...>>) but also increment the reference count of the inner Rc, otherwise it might believe that it is the only reference to the underlying data and we might then modify this in-place even though we should not.

So if we have a structure like Obj(Rc<ObjType>), I think we need to implement a custom Obj::clone method which traverses the object's internal data and clones all the reference counted objects (in this case Rc<Double> and the Subsets).

dgkf commented 4 months ago

So if we have a structure like Obj(Rc<ObjType>), I think we need to implement a custom Obj::clone method which traverses the object's internal data and clones all the reference counted objects (in this case Rc<Double> and the Subsets).

Yes, absolutely. Though I would suggest that instead of clone, have should a custom make_mut. In most cases, Obj::make_mut can just defer to Rc::make_mut, but for some objects where we want some internal optimizations we may need to traverse the structure to ensure the interior data to handle internal Rc's as well.

sebffischer commented 3 months ago

Convert Environment<RefCell<HashMap<, Obj>>> to Environment<RefCell<HashMap<, Rc>>>

I think for the future Obj::Scalar, we might not want the Rc, at least not for double, logical and integer. Also, for Vector, the internal Rcs also obviate the need for the Rc<Obj>.