Open aquasync opened 2 years ago
Thanks for reporting. You've found an interesting way to get R to return the result in an ALTREP wrapper. This is an optimisation where R avoids allocating memory (e.g. when you type 1:64 R doesn't actually allocate 64 integers immediately). In this case, R produces a wrapper that just points to the other column. As you have noticed, touching the column in any way gets this expanded and the effect disappears. As data.table
circumvents many R mechanisms to achieve its efficiency, it goes to some lengths to catch these cases but currently misses this one.
Wow thanks @tlapak, incredibly quick diagnosis. Would have never expected ALTREP to be the issue either.
The below triggers the bug for me (note the assignment to
col2
changing the value ofcol1
!):And my
sessionInfo()
output:Basically it looks like col1 and col2 end up pointing at the same vector such that
:=
modifies them both; I'm guessing they are shared but the reference counts are off such that:=
thinks it is safe to modify in-place. Not 100% clear to me if the actual underlying bug may be base R or data.table.When trying to put together a minimal repro, I noticed a few different changes that make this bug disappear:
Simply printing the data table between the col1 and col2 assignments makes the issue go away.
It only manifests where the number of rows is at least 64. Perhaps that is used as a threshold at which some sort of copy-on-write optimization logic is kicking in somewhere?
Also the problem seems to be related to the coalesce function used here, despite it not having any effect in this example. Eg replacing it with
coalesce = function(x, ...) x
avoids any issue. It seems as though base r is doing something weird with[<-
with an all false logical subset; maybe the result is the same object but no longer marked as shared? Note that assigning to col1 after coalesce does not affect col2, only vice-versa. Alternatively returningx[]
in coalesce bypasses the erroneous sharing by forcing a copy or bumping the ref count.