duncantl / CodeDepends

Analysis of R code for reproducible research and code view
http://www.omegahat.net/CodeDepends
89 stars 16 forks source link

Inconsistent detection of the magrittr dot (.) as an NSE variable #26

Open wlandau opened 6 years ago

wlandau commented 6 years ago

From https://github.com/ropensci/drake/issues/320:

x <- CodeDepends::getInputs(quote(read_csv("data.csv") %>% mutate(.data = ., 
  foo = bar + baz)))
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
x@inputs
#> character(0)
x@nsevalVars
#> [1] "."   "bar" "baz"

x <- CodeDepends::getInputs(quote(raw_data %>% select(., starts_with("xyz"))))
x@inputs
#> [1] "raw_data"
x@nsevalVars
#> [1] "."

x <- CodeDepends::getInputs(quote(raw_data %>% filter(.)))
x@inputs
#> [1] "."        "raw_data"
x@nsevalVars
#> character(0)

x <- CodeDepends::getInputs(quote(raw_data %>% filter(complete.cases(.))))
x@inputs
#> [1] "."        "raw_data"
x@nsevalVars
#> character(0)
gmbecker commented 6 years ago

@wlandau

So filter has different semantics whether you are hitting the Bioconductor or dplyr version. The way that CodeDepends currently works (which is not quite perfectly correct but would cover the case here, I suspect) is that if dplyr is loaded previously in the script CodeDepends is aware of, it handles arguments to filter the way you would want for dplyr::filter. If not, it treats the filter call normally.

i.e.,

> expr = readScript(txt = "library(dplyr); raw_data %>% filter(.)")
> getInputs(expr)[[2]]@inputs
[1] "raw_data"
> getInputs(expr)[[2]]@nsevalVars
[1] "."

You're probably right that more generally across everything we might want . to always be marked as nse (or ignored entirely, to be honest). I can look at doing that but that will take a bit more doing.

How is the code getting into CodeDepends? Is drake tracking which libraries are loaded somehow, or does it have a full script to operate on rather than only individual expressions (even beyond this, just in general CodeDepends is going to behave better and be more powerful/correct when operating on the whole script).

wlandau commented 6 years ago

Contrary to the established traditions of Make-like tools, drake focuses on the user's R session, not script files. It analyzes the commands in the workflow plan data frame and the "imported" functions loaded in your R session (or a custom environment you provide). If CodeDepends behaves differently depending on which packages are loaded, targets could be invalidated in unpredictable ways, which does not bode well for drake.

gmbecker commented 6 years ago

It has nothing to do with what is actually loaded in the session. Though for the record the correct analysis ofa filter call actually does technically sependnon what version of filter it will actually hit when run, ie what packages are loaded

We do not look at that, but use a heuristic that depends on having the full script available to guess at which version of filter it should assume is being hit. This is not a limitation of codedepends but od statically analyzing a dynamic language. Its not immediately clear how we could do mich better rhat we are doing now without incorporating runtime analysis.

I'll look at this more closely tomorrow and will respond further then as warrented.

On Mar 13, 2018 6:59 PM, "Will Landau" notifications@github.com wrote:

Contrary to the established traditions of Make-like tools, drake focuses on the user's R session, not script files. It analyzes the commands in the workflow plan data frame https://ropensci.github.io/drake/articles/example-basic.html#the-workflow-plan-data-frame and the "imported" functions loaded in your R session (or a custom environment you provide). If CodeDepends behaves differently depending on which packages are loaded, targets could be invalidated in unpredictable ways, which does not bode well for drake.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/duncantl/CodeDepends/issues/26#issuecomment-372879138, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dseFvll9Vr6Phq6Jmwvmczogz009wks5teHlggaJpZM4SpXbE .