Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 986 forks source link

Warning for set / := when numToDo > nrow(x)? #3557

Open MichaelChirico opened 5 years ago

MichaelChirico commented 5 years ago

Follow-up to #2837 inspired by #1885 / #3460

We decided running duplicated every time we run set/:= to help protect against duplicated assignment is overkill & an efficiency killer.

However, when length(i) > nrow(x), there's no need to run duplicated -- there's guaranteed to be at least one duplicate in i (Pigeonhole):

DT = data.table(a = 1L)
DT[c(1, 1), a := 2:3]

We could warn in this case?

jangorecki commented 5 years ago

unless DT[c(1, NA), a := 2:3]

franknarf1 commented 5 years ago

Re @jangorecki 's example, you can use numToDo instead of length(i) then? That count is already recorded with NAs removed as seen in

> DT[c(1, NA), a := 2:3, verbose=TRUE]
Detected that j uses these columns: a 
Assigning to 1 row subset of 1 rows
RHS_list_of_columns == false

I guess this would just be adding a warning near the message above https://github.com/Rdatatable/data.table/blob/23853544ec27c15744e0a02cfb26691826cf6c60/src/assign.c#L344

Related https://github.com/Rdatatable/data.table/issues/2022

MichaelChirico commented 5 years ago

Thanks Frank/Jan, yes length(i) is not the condition we should use but it seems numToDo would be the corrected adjustment.

Main Q is whether we thing this is warning-worthy behavior? Or perhaps just something to append to/reiterate in the verbose message?

jangorecki commented 4 years ago

@MichaelChirico lets start with verbose message, this at least won't make any breaking changes

jangorecki commented 4 years ago

isn't it duplicate of #2022 ?