Open dgkf opened 4 months ago
I think there is some interplay between the low-level Rc
for the RepType::Subset(Rc<Vec<T>>, ...)
and the Rc
on the object-level (Rc<Obj>
) that we need to pay attention to.
The structure of the objects stored in the environment will look something like Rc<Obj::Vector(Vector::Double(RefCell<RepType::Subset(Rc<Double>, Subsets)>))>
Consider the example below.
x <- [1, 2, 3]
y <- x
x[[1]] <- 99
then line 2 must not only increment the reference count of the outer Rc (Rc<Obj<...>>
) but also increment the reference count of the inner Rc
, otherwise it might believe that it is the only reference to the underlying data and we might then modify this in-place even though we should not.
So if we have a structure like Obj(Rc<ObjType>)
, I think we need to implement a custom Obj::clone
method which traverses the object's internal data and clones all the reference counted objects (in this case Rc<Double>
and the Subset
s).
So if we have a structure like
Obj(Rc<ObjType>)
, I think we need to implement a customObj::clone
method which traverses the object's internal data and clones all the reference counted objects (in this caseRc<Double>
and theSubsets
).
Yes, absolutely. Though I would suggest that instead of clone
, have should a custom make_mut
. In most cases, Obj::make_mut
can just defer to Rc::make_mut
, but for some objects where we want some internal optimizations we may need to traverse the structure to ensure the interior data to handle internal Rc
's as well.
Convert Environment<RefCell<HashMap<, Obj>>> to Environment<RefCell<HashMap<, Rc
>>>
I think for the future Obj::Scalar
, we might not want the Rc
, at least not for double, logical and integer.
Also, for Vector
, the internal Rc
s also obviate the need for the Rc<Obj>
.
This is a meta-issue to cover a few different discussions on mutability patterns.
112 and https://github.com/dgkf/R/pull/123#discussion_r1558402942 have me feeling like now might be a good time to think more holistically about how objects are referenced and mutated. While getting this project off the ground, clones were used generously and it's time to be more thoughtful about minimizing those.
Overall, I think we're on the right path right now, so if you're already up to date on those discussions, this is probably more of a summary than any novel plan. I'll collect the ideas that are leading us to where we are:
Environments of shared objects
Environments should be a collection of
Rc<Obj>
, allowing multiple references to exist to the same data. A trivial example like:Should never copy the data in x.
Environment<RefCell<HashMap<_, Obj>>>
toEnvironment<RefCell<HashMap<_, Rc<Obj>>>>
mutable objects
Mutable objects should, by default, use
Rc::make_mut
, providing copy-on-write to objects whose mutability does not change the internal representation of the data.In the following example:
First, the right-hand-side is evaluated using
x[[1]]
andx[[3]]
, producing a result using only references tox
. The left-hand-side needs to first be made into a mutable reference before producing a subset and updating the value using assignment.I anticipate that the way subsetting operators work might need to change to accommodate this, perhaps
Rc::make_mut
needs to be called whenx
is first accessed, given that it is on the left-hand-side of the assignment.Rep
s to make use ofRc
andRc::make_mut
for managing mutability (#112)mutable object representations
And finally, when an object has multiple representations, and its representation needs to change (for instance, materializing a range into a vector), these representations should be behind a
RefCell<RepType<_>>
Vector
(#112)