WinVector / replyr

Patches for using dplyr with Databases and Big Data
https://winvector.github.io/replyr/
Other
67 stars 12 forks source link

Non-Visible Binding #1

Closed jaredlander closed 7 years ago

jaredlander commented 7 years ago

Not an issue but a question. Does this suffer from the non-visible binding warning from CRAN?

JohnMount commented 7 years ago

Yes it does unfortunately suffer from the non-visible binding warnings both in its own construction and during application. I have seen this when running CRAN check.

In one's own code you can try the "assign NULL sleaze" to work around. Beyond using the "with method" (also in the same article), suppressForeignCheck, or globalVariables I don't see a safe (limited scope) way to suppress the warnings better than the "assign NULL sleaze."

For others. I am writing a long response because Jared Lander has brought up a very important, but somewhat subtle issue that I do not have a completely satisfactory response to. I want to emphasize why the question is in fact important and my response "no we don't know how to completely fix it" is not to diminish from the point.

The issue Jared Lander brings up is use of formula-style or lazy-eval style notation such as encouraged in dplyr and ggplot2 causes a warning of the style "no visible binding for global variable ..." during code checks. This is because in forms like ggplot(data=data.frame(x=1:3)) + geom_point(aes(x=x,y=x)) the right-hand sides of the aes portion (the x) look like unbound variable name references. Unbound variable name reference would be an error, and catching such an error at compile time is important because they could be on a branch of code that didn't get covered during runtime testing. However in the presence of non-standard evaluation these unbound references are captured inside the called code and converted to strings inside ggplot2 which then uses them to refer to column names in our data frame. This is not unique to dplyr and ggplot2 it also shows up with formula (it should also happen to with, but with may have hard-coded special-case check mitigations, please see here).

This means to submit a package that uses such code to CRAN you must, at least, add notes identifying these warning as spurious. This does not look good (does one really know there isn't a real unbound variable mixed in with the spurious warnings) and makes more work for the CRAN reviewers as they have to read your note and decide if they believe you actually checked the warnings.

For example in our package vtreat (available on CRAN) we are very strict about not depending on non-standard evaluation tricks and are able to build and check that package with no NOTEs or WARNINGs from the automated check process. Yes the check process is not perfect, but it is convenient to be able to produce code that passes the check process (in addition to hopefully being correct).

As a side-note non-standard evaluation is useful, but from what I have seen it is a “only funny when I do it" technique. I have seen authors that use non-standard evaluation in their own packages explicitly say not to use methods from other packages for in turn using non-standard evaluation.