Open huzefaKhalil opened 1 year ago
Are you able to reproduce the problem if you assign using :=? Probably index is invalid after <-
:= works fine.
dt[select, "col"] <- FALSE
works fine as well.
Only dt[select]$col <- FALSE
gives an incorrect result without an error.
dt[select]$col <- FALSE
It is a bad practice and we need a error to stop it if possible, otherwise we just need to document this behavior in the don'ts wiki
dt[select]$col <- FALSE
It is a bad practice and we need a error to stop it if possible, otherwise we just need to document this behavior in the don'ts wiki
Nah, it should just work since DT is an extension of DF
I think the data.frame's sintax would be
dt$col[select] <- FALSE
Dived deeper into this problem. The real problem is the subsequent call on [<-.data.table
which evaluates isub = ID %in% sample(dt[Event == TRUE]$ID, nToChange)
. However, before getting called in "[<-", ID %in% sample(dt[Event == TRUE]$ID, nToChange)
will already have been evaluated once for the value of "[<-" and therefore the next evaluation will take the next random sample.
tldr; we do something similar to eval(expression(sample(10)))
twice which gives (now not surprisingly) two different random samples. I do not see how we could fix or improve this besides triggering a warning when we spot a sample
in i
, but ofc this applies to every non-deterministic function evaluation.
We also have the long standing comment since 2016
# TO DO: warning("Please use DT[i,j:=value] syntax instead of DT[i,j]<-value, for efficiency. See ?':='")
I needed to change the value in cells of rows selected at random but doing so gives incorrect results. There is no error, just the result is incorrect, hence it was particularly hard to track this down when dealing with thousands of row entries.
As you can see above, the ID for 1010 has been changed to 1005!
Selecting the rows to change outside the
[]
gives correct results.The verbose output from the first data.table assignment is given below.
My sessionInfo() is below.