Open sebffischer opened 7 months ago
I agree with the sentiment, but from R's perspective, this is totally within "well-defined" behavior.
R works quite differently from this repo's implementation when it comes to subset assignment. It's horrendously inefficient with many intermediate vectors. For R, this is perfectly fine and I imagine intentional. It's elegant in how generalized its fn<-
syntax is, but comes at a large cost for allocation. Let's break it down (slightly tweaking the example so the process is more evident):
trace(`[<-`, quote(print(sys.call(-4L))))
x[c(1, 1)][c(1, 2, 3)] <- c(99, 100)
The following is messy and hard to read, but effectively each [
produces an intermediate temporary values with the appropriate size, then assigns it to the next vector (which might be the next temporary vector).
*tmp1* <- vector(<length of subset>, <mode of x>)
[<-(*tmp1*, c(1, 2, 3), c(99, 100))
*tmp1* <- c(99, 100)[<length of subset>]
(= c(99, 100, 99)
)*tmp2* <- vector(<length of subset>, <mode of *tmp1*>)
[<-(*tmp2*, c(1, 1), *tmp1*)
*tmp2* <- *tmp1*[1:length(<subset>)]
(= c(99, 100)
)x[[1]] <- *tmp2*[[1]]
x[[1]] <- *tmp2*[[2]]
This is also exacerbated in list
assignment in R. Assigning to list$a$b$c$d
creates (I think) n^2 intermediate lists that copy their respective subset.
All of that is to say. I agree, we should be well defined, but "well defined" may include repeated indices at times - We can focus on that within the individual alt-rep assignment semantics.
Uff, that seems expensive, but I agree! Checking for such cases would probably also be quite the overhead! Should I close this issue?
This is just a copy of this comment https://github.com/dgkf/R/pull/112#issuecomment-2054041873 as I don't want it to get lost in case the PR gets merged.
R has some edge-cases where subset-assignment has weird (and I image undefined) behavior:
I think we should only allow vector subsets in assignment calls where the behavior is well-defined. From the currently supported subsets, I think
Subset::Mask
andSubset::Range
have this property, i.e. they can never produce a view where the same data is referenced via different indices of the resulting view.