The resulting ddg can be misleading if data is passed from one chunk to another, where an intermediate chunk turns off provenance but also modifies that variable. Here is an example:
``{r, details = TRUE}
x <- 2+2
y <- x+2
x
y
z <- 1
``
``{r pressure, echo=FALSE, details = FALSE}
library(ggplot2)
p<- ggplot(pressure, aes(x=temperature, y=pressure))
p
z <- z + 1
``
``{r, details = TRUE}
data(iris)
x <- iris[,1]
y <- iris[,2]
summary(lm(y~x))
n <- z + z
``
Here the first and third chunks collect provenance but the 2nd does not. Chunk 1 sets z, chunk 2 modifies z, and chunk 3 uses z. Here is the ddg we end up with (shown with the chunks collapsed to highlight the problem.
If you click on the node for z, it shows the value 1 since that is its value when the variable is set. If you expand chunk 3 and click on the node for n, it shows the value 4, which is also correct. The problem is that the value of 1 for z is not correct as the input to the statement that sets n.
Possible solutions:
Add some sort of "details omitted" node similar to what we had in rdt. Perhaps have 3-z -> details omitted -> 4-z. In some ways, "details omitted" is what chunk 2 is doing, but we don't know how. In general, there could be multiple chunks between 1 and 3, and it would mean any of them.
Look at how we handle functions that modify globals. In that case, we show that the function changes the variables without showing how, like this:
f <- function () {
z <<- z + 1
}
z <- 1
f()
n <- z + z
print(n)
This issue refers to the ProvSean_Parse branch.
The resulting ddg can be misleading if data is passed from one chunk to another, where an intermediate chunk turns off provenance but also modifies that variable. Here is an example:
``{r, details = TRUE} x <- 2+2 y <- x+2 x y
z <- 1
``
``{r pressure, echo=FALSE, details = FALSE} library(ggplot2) p<- ggplot(pressure, aes(x=temperature, y=pressure)) p z <- z + 1
``
``{r, details = TRUE} data(iris) x <- iris[,1] y <- iris[,2] summary(lm(y~x)) n <- z + z
``
Here the first and third chunks collect provenance but the 2nd does not. Chunk 1 sets z, chunk 2 modifies z, and chunk 3 uses z. Here is the ddg we end up with (shown with the chunks collapsed to highlight the problem.
If you click on the node for z, it shows the value 1 since that is its value when the variable is set. If you expand chunk 3 and click on the node for n, it shows the value 4, which is also correct. The problem is that the value of 1 for z is not correct as the input to the statement that sets n.
Possible solutions: