End-to-end-provenance / RDataTracker

An R library to collect provenance from R scripts.
http://end-to-end-provenance.github.io/
GNU General Public License v3.0
39 stars 6 forks source link

Misleading DDG when a chunk omits details #679

Closed blernermhc closed 1 year ago

blernermhc commented 1 year ago

This issue refers to the ProvSean_Parse branch.

The resulting ddg can be misleading if data is passed from one chunk to another, where an intermediate chunk turns off provenance but also modifies that variable. Here is an example:

``{r, details = TRUE} x <- 2+2 y <- x+2 x y

z <- 1

``

``{r pressure, echo=FALSE, details = FALSE} library(ggplot2) p<- ggplot(pressure, aes(x=temperature, y=pressure)) p z <- z + 1

``

``{r, details = TRUE} data(iris) x <- iris[,1] y <- iris[,2] summary(lm(y~x)) n <- z + z

``

Here the first and third chunks collect provenance but the 2nd does not. Chunk 1 sets z, chunk 2 modifies z, and chunk 3 uses z. Here is the ddg we end up with (shown with the chunks collapsed to highlight the problem.

image

If you click on the node for z, it shows the value 1 since that is its value when the variable is set. If you expand chunk 3 and click on the node for n, it shows the value 4, which is also correct. The problem is that the value of 1 for z is not correct as the input to the statement that sets n.

Possible solutions:

f <- function () {
  z <<- z + 1
}

z <- 1
f()
n <- z + z
print(n)

image

blernermhc commented 1 year ago

Fixed in 6fa936841269e3fd54062c1e9e4650c9c2e8dd41