Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 986 forks source link

shift give.names could apply auto-name logic to list() input #3905

Open MichaelChirico opened 5 years ago

MichaelChirico commented 5 years ago
set.seed(38493)
DT = data.table(latitude = runif(10, -90, 90), longitude = runif(10, -180, 180))
DT[ , shift(.(latitude, longitude), c(-1L, 1L), give.names = TRUE)]
    V1_lead_1   V1_lag_1  V2_lead_1   V2_lag_1
 1: -38.96353         NA -147.14499         NA
 2:  29.04300   2.298902  -11.81201   44.61912
 3:  85.19639 -38.963529  178.86941 -147.14499
 4: -18.45281  29.042995 -159.37338  -11.81201
 5: -24.63941  85.196393  -74.79962  178.86941
 6:  39.93444 -18.452807  -63.89377 -159.37338
 7:  89.09799 -24.639414   10.13899  -74.79962
 8: -39.62935  39.934439  111.02200  -63.89377
 9:  45.81521  89.097989   69.86217   10.13899
10:        NA -39.629354         NA  111.02200

I kind of expected give.names to apply the "usual" (in data.table context at least) auto-names to the .(latitude, longitude) input.

jangorecki commented 4 years ago

to reduce number of NSE operations I would rather prefer to accept character vector to give.names, otherwise new names will look ugly for a on-the-fly computed elements

MichaelChirico commented 4 years ago

to reduce number of NSE operations

We should have some internal utilities for this, then there's no duplication/bifurcation of logic at least

jangorecki commented 2 years ago

I think we could handle that in a more generic way. I recall some other places which could benefit from it as well. And one new coming in PR adding give.names to rolling functions. The more generic way I have in mind is to pre-process j calls of form .(x, y) into calls .(x=x, y=y). Then input we supply to shift (or any other fun) is a named list, rather unnamed, and give.names will handle that. So no change to shift is needed. NSE (to substitute names of variables) is moved to [ where is a good place for it already.

jangorecki commented 2 years ago

Proof of concept seems to work, but we also need to handle list() not just .(), and this will likely be a breaking change when user defined functions relied on V1, etc names.

DT[ , shift(.(latitude, longitude), c(-1L, 1L), give.names = TRUE)]
#    latitude_lead_1 latitude_lag_1 longitude_lead_1 longitude_lag_1
#              <num>          <num>            <num>           <num>
# 1:       -38.96353             NA       -147.14499              NA
#...

diff

diff --git a/R/data.table.R b/R/data.table.R
index 473cf6e7..269a1555 100644
--- a/R/data.table.R
+++ b/R/data.table.R
@@ -94,7 +94,11 @@ replace_dot_alias = function(e) {
     # . alias also used within bquote, #1912
     if (e[[1L]] == 'bquote') return(e)
     if (e[[1L]] == ".") e[[1L]] = quote(list)
-    for (i in seq_along(e)[-1L]) if (!is.null(e[[i]])) e[[i]] = replace_dot_alias(e[[i]])
+    for (i in seq_along(e)[-1L]) if (!is.null(e[[i]])) {
+      if (is.name(e[[i]]) && is.null(names(e[i]))) {
+        names(e)[i] = as.character(e[[i]])
+      } else e[[i]] = replace_dot_alias(e[[i]])
+    }
   }
   e
 }