Closed shrektan closed 4 years ago
DT[(FLAG)] works though; allowing DT[FLAG] opens the Pandora's box for ambiguity -- is FLAG in DT or is it in the parent frame? what if FLAG is in both DT & parent.frame?
I use DT[(FLAG)]
in this case.
I think the data.table behavior/semantics should not be determined dynamically by whether the column exists in the data.table or it would be hard for script reader to guess the intention of the code because the reader needs to know if this column exists. If the semantics is quite static, it would be easier to manage and work with.
Sorry, I don't get your guys' point. Even within ()
, the variable is determined dynamically. Isn't it?
That's said, the variable inside ()
doesn't guarantee it's a column inside of DT.
library(data.table)
tbl <- data.table(VALUE = c(1, 0))
flag <- c(FALSE, TRUE)
tbl[(flag)]
#> VALUE
#> 1: 0
tbl[, flag := !flag]
tbl[(flag)]
#> VALUE flag
#> 1: 1 TRUE
Created on 2020-02-23 by the reprex package (v0.3.0)
Sorry for not making my point clear.
My point is that we already use a lot of NSE in data table syntax by customizing the interpretation of expressions in [.data.table
which is dynamic but the semantics or the meaning of the code we are trying to define is not necessarily (and in my point of view, should not) dynamically determined (not depending on if a column exists or something we only know at runtime).
Suppose we are only looking at the following code:
tbl[flag]
If the meaning or the semantics of the code depends on if flag
is a column in tbl
, then one has to know if tbl
has flag
at runtime at this point to know what the author is trying to do.
By contrast, if tbl[flag]
is always evaluated in the same way without depending on whether flag
is a column of tbl
, then the reader of this code knows what the author is trying to do without having to know any runtime information, if the reader knows the evaluation rule. At this level, @shrektan is trying to say that the evaluation rule here might not be natural enough.
Currently, the evaluation rule of tbl[flag]
is simple: if flag
is a symbol then dt
assumes it is an external selector and always tries to find it in the calling scope. Although the evaluation of flag
follows dynamic scoping, but the semantics or how it is evaluated is predetermined and is not determined at runtime, not depending on if a column exists.
This makes the code much easier to maintain and read in complex and collaborative data project.
Agree with @renkun-ken -- in principle, we could examine whether flag
is a logical
column of tbl
during evaluation of [
, and then assume the user is trying to subset with this column;
The current behavior allows (with just two extra characters) the user to signal the intent that flag
is a column unambiguously.
The former comes at the cost of (1) increasing the complexity of our codebase (2) introducing some assumptions which are hard to get around [i.e., suppose flag
shouldn't be read as being a column of tbl
, how can the user assert this so that we don't treat it like that?]
About whether (
forces flag
to be a column of tbl
-- it doesn't , but it clearly establishes the scoping rules --
(
, scoping goes: inside tbl
, then parent.frame()
, then upwards as normal(
, scoping goes: parent.frame()
, then upwards as normalThanks, I see your points. Since I always use UPPERCASE as the column and smallercase as the variable name to avoid the ambiguity, I ignored one possibility that is the i
in DT[i]
could be more things like a character vector or another data.table.
By always evaluating the i
above in the external frame indeed is better to understand in such cases (as well as the maintainability).
So I'm closing this issue.
@renkun-ken @MichaelChirico Thanks for both of your answers.
@MichaelChirico [
already checks to see if flag is a logical. That's why there's a helpful error telling users that they can use (FLAG) | dt$FLAG | FLAG == TRUE
. But it is surprising behavior to a user- why does dt[FLAG & FLAG]
work?
We could easily accommodate the request with adding an if else statement to the symbol
evaluation branch. I compiled the below changes - only test 1773.01 is affected. edit: added else statement at the end to make it work for real on tbl[FLAG]
.
##########see line 366 in data.table.R
# isub is a single symbol name such as B in DT[B]
i = try(eval(isub, parent.frame(), parent.frame()), silent=TRUE)
if (inherits(i,"try-error")) {
# must be "not found" since isub is a mere symbol
col = try(eval(isub, x), silent=TRUE) # is it a column name?
#########wrap with new if statement#################
if(typeof(col) != "logical") { #####new line 1
msg = if (inherits(col,"try-error")) " and it is not a column name either."
else paste0(" but it is a column of type ", typeof(col),". If you wish to select rows where that column contains TRUE",
", or perhaps that column contains row numbers of itself to select, try DT[(col)], DT[DT$col], or DT[col==TRUE] is particularly clear and is optimized.")
stop(as.character(isub), " is not found in calling scope", msg,
" When the first argument inside DT[...] is a single symbol (e.g. DT[var]), data.table looks for var in calling scope.")
} else { #####new
i = col ####new
} ####new
Regarding scoping, the description is inconsistent with j
. edit - had these reversed initially.
(
, scoping is limited to inside tbl
(
, scoping is limited to only the parent.frame()
. library(data.table)
tbl <- data.table(FLAG = c(TRUE, FALSE), VALUE = c(1, 0))
col = "surprise"
tbl[, col := 1L][]
#> FLAG VALUE col
#> <lgcl> <num> <int>
#> 1: TRUE 1 1
#> 2: FALSE 0 1
tbl[, (col) := 1L][]
#> FLAG VALUE col surprise
#> <lgcl> <num> <int> <int>
#> 1: TRUE 1 1 1
#> 2: FALSE 0 1 1
tbl <- data.table(FLAG = c(TRUE, FALSE), VALUE = c(1, 0))
tbl[, (new_col) := 1L]
#> Error in eval(lhs, parent.frame(), parent.frame()): object 'new_col' not found
If DT contains a logical vector (call it the
FLAG
column), we have to useDT[FLAG==TRUE]
whileDT[FLAG]
will raise errors. It's kind of confusing for the newbies.Any reason that we can't support this?
Created on 2020-02-23 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.2 (2019-12-12) #> os macOS Catalina 10.15.3 #> system x86_64, darwin15.6.0 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Asia/Shanghai #> date 2020-02-23 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0) #> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.0) #> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) #> data.table * 1.12.8 2019-12-09 [1] CRAN (R 3.6.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) #> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.0) #> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.0) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) #> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) #> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) #> htmltools 0.4.0.9002 2020-01-27 [1] Github (rstudio/htmltools@e07546c) #> knitr 1.27.2 2020-01-31 [1] local #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.0) #> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0) #> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.0) #> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0) #> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2) #> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) #> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) #> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.0) #> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) #> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) #> #> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library ```