cameronbracken / pgfSweave

Quality graphics and speedy compilation with Sweave
http://r-forge.r-project.org/projects/pgfsweave/
29 stars 6 forks source link

keep.source=T counteracts the expected behaviour of cache=T #40

Open hans-ekbrand opened 12 years ago

hans-ekbrand commented 12 years ago

I am not sure whether there is a change in pgfSweave, or a bug in my setup, or if this is how things have worked all along, but I recently found out that I had to manually set eval=F in order to get the expected benefits from caching of chunks that only make calculations.

Below is a minimal reproducible example. I would expect it to run significantly faster the second time, since the calculations of the first chunk should be omitted and the results loaded from the cache. However, the first chunk is run again, unless I manually sets eval=F. Doing so, I can get the expected behaviour, sort of. Objects saved in the global environment will be accessible, but not really read from the cache. The caching-mechanism seems inactivitated altogether. EDIT: the subject-line of this issue perhaps should read something like: "caching mechanism inactivated".

The user should not have to manually fiddle with the eval option of each and every chunk in order to get the benefits of caching.

(The chunk-start-markers does not show up correctly, but I can't seem to fix that, it something with how github deals with mark-up text, I guess. The second chunk should have fig=T).

\documentclass{article} \usepackage{tikz} \usepackage[utf8]{inputenc} \usepackage[nogin]{Sweave} \SweaveOpts{echo=F,fig=F,eval=T,results=hide,cache=T,width=4,height=4,external=T} \begin{document}

<<>>= x <- 10000 set.seed(42) a <- rnorm(x) b <- factor(LETTERS[sample(1:7, x, replace = TRUE)]) c <- factor(LETTERS[sample(1:4, x, replace = TRUE)]) my.fit <- glm(c ~ b + a, family = "binomial") my.results <- confint(my.fit) @

<<>>= barplot(rowSums(my.results[2:7,])) @

\end{document}

Info on my system:

R version 2.14.1 (2011-12-22) filehash: Simple key-value database (2.2 2011-07-21) A Set of Tools for Administering SHared Repositories (0.3-4 2011-07-15) tikzDevice: A Device for R Graphics Output in PGF/TikZ Format (v0.6.1) pgfSweave: Using PGF Version 2.10 pgfSweave: Version 1.2.1 - 2011-04-03

Kind regards,

Hans Ekbrand

yihui commented 12 years ago

you can use three backticks to protect your code; see Syntax highlighting in http://github.github.com/github-flavored-markdown/

I took your example as a test case for my knitr package (sorry Cameron! I know this is shameless ads...), and it worked as expected.

library(knitr)
knit('pgf.Rnw')

where the file pgf.Rnw is as below:

\documentclass{article}
% \SweaveOpts{cache=T,dev='tikz',fig.width=4,fig.height=4}
\begin{document}

<<>>=
x <- 10000
set.seed(42)
a <- rnorm(x)
b <- factor(LETTERS[sample(1:7, x, replace = TRUE)])
c <- factor(LETTERS[sample(1:4, x, replace = TRUE)])
my.fit <- glm(c ~ b + a, family = "binomial")
my.results <- confint(my.fit)
@

<<>>=
barplot(rowSums(my.results[2:7,]))
@

\end{document}
hans-ekbrand commented 12 years ago

Just to show the differences between cacheSweave and pgfSweave:

\documentclass{article}
\usepackage{tikz}
\usepackage[nogin]{Sweave}
\begin{document}
<<foo,cache=T>>=
x <- 10000
set.seed(42)
a <- rnorm(x)
b <- factor(LETTERS[sample(1:7, x, replace = TRUE)])
c <- factor(LETTERS[sample(1:4, x, replace = TRUE)])
my.fit <- glm(c ~ b + a, family = "binomial")
my.results <- confint(my.fit)
@
<<bar,fig=T>>=
barplot(rowSums(my.results[2:7,]))
@
<<fool,cache=T>>=
x <- 10000
set.seed(42)
a <- rnorm(x)
b <- factor(LETTERS[sample(1:7, x, replace = TRUE)])
c <- factor(LETTERS[sample(1:4, x, replace = TRUE)])
my.fit <- glm(c ~ b + a, family = "binomial")
trash <- system("sleep 10")
@

<<bat,fig=T>>=
barplot(coef(my.fit)[2:7])
@
\end{document}

When I run this in R using

library(cacheSweave)
system.time(Sweave("bar.Rnw", driver = cacheSweaveDriver, compile.tex = F, pdf = T))
system.time(Sweave("bar.Rnw", driver = cacheSweaveDriver, compile.tex = F, pdf = T))
detach(package:cacheSweave)
library(pgfSweave)
system.time(pgfSweave("bar.Rnw", compile.tex = F, pdf = T))
system.time(pgfSweave("bar.Rnw", compile.tex = F, pdf = T))
detach(package:pgfSweave)

cacheSweave does the right thing.

hans-ekbrand commented 12 years ago

I also see new cache-folders being created for each run, that is unexpected since I have not changed the chunks.

hans-ekbrand commented 12 years ago

After quite some debuging efforts, I have found the problem related to the keep.source option. When it is set to F, caching works as expected, when set to T (or unset, as the default is T), caching is essentially inactivated. The following example works good, as long as keep.source=F.

\documentclass{article}
\usepackage{tikz}
\usepackage[nogin]{Sweave}
\SweaveOpts{cache=T,keep.source=F}
\begin{document}
<<foo>>=
x <- 10000
set.seed(42)
a <- rnorm(x)
b <- factor(LETTERS[sample(1:7, x, replace = TRUE)])
c <- factor(LETTERS[sample(1:4, x, replace = TRUE)])
my.fit <- glm(c ~ b + a, family = "binomial")
system("sleep 20")
@
<<bat,fig=T>>=
barplot(coef(my.fit)[2:7])
@
\end{document}