KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
622 stars 159 forks source link

Clear explanation of where/how to use Notebooks (Rmd) with ProjectTemplate #293

Closed metanoid closed 2 years ago

metanoid commented 5 years ago

Report an Issue / Request a Feature

I'm submitting a :


Issue Severity Classification -
Expected Behavior

If I have an R Notebook (.Rmd) in the /reports subfolder or the /src subfolder, and in that document I call

library(ProjectTemplate)
load.project()

then this should load the project.

Current Behavior

Calling the exact same code from a .R file and a .Rmd file (in RStudio) has different behaviour. The former works, the latter does not. This is because the latter always runs code in the files parent directory.

Steps to Reproduce Behavior

Create a new ProjectTemplate project Create a new "R Notebook" document, and save it in the /reports subfolder. In the first chunk, include:

library(ProjectTemplate)
load.project()
Version Information

R 3.5.3 ProjectTemplate 0.9.0 RStudio 1.1.463

Possible Solution

Currently, the load.project() function fails if it is not run from the correct folder. Would it be possible to add an argument to the load.project function where the user can specify the path to the project folder, defaulting to getwd()?

Alternatively, the package rprojroot has functionality to explore the file system by successively checking each parent folder to see if a condition is met - can load.project() use this to find the ProjectTemplate project folder, assuming it is a parent of the current working folder?

Maybe I've misunderstood.

KentonWhite commented 5 years ago

That's a good idea @metanoid! I'm traveling for the next couple of weeks. If you are feeling ambitious, could you try making the changes to load.project() and submitting a PR please?

Traversing the directory tree is a cool idea — basically if load.project() is run in a subdirectory of a project it will traverse up the tree and uses whatever project it finds. This could be useful in many situations, like just running scripts.

One challenge I see is dealing with many implicit assumptions. For example, ProjectTemplate assumes it is operating in the Project's root directory. Many paths are hardcoded relative to this home directory (e.g. data instead of ProjectTemplateHome()/data. The "ideal" way would be making a function ProjectTempateHome() that returns the path to the home directory (. by default), but this may miss some of the many hard coded paths.

The other solution would be internally using setwd() to ProjectTemplateHome and then executing everything. This means that all code in subdirectories would need to use paths relative to ProjectTemplateHome. This might make it difficult for people working in an rmd directory, I'm not sure. What are your thoughts?

As I said I'm traveling the next couple of weeks. It would be great if you could take a stab at a PR. If you can't, I understand and will look more deeply at implementing this towards the end of April / beginning of May.

KentonWhite commented 4 years ago

Hi @metanoid, I'm working on this now (sorry it has taken so long — just a low priority). If I understand from your original post, the reason a .R file works and a .Rmd file doesn't work is that RMarkdown runs in the parent directory? Can you explain this a bit more?

For example, if you are in the ProjectTemplate directory and you run source('reports/test.R') that it works because source works in current working directory. But if you run render('reports/test.Rmd') it fails because the code is actually being run in the parent directory?

metanoid commented 4 years ago

Hi @KentonWhite!

Yes, that's correct. For context, when I say "run an Rmd file" I mean this:

Doing so runs all the code in the Rmd file from the parent directory of that file.

So, if that file is in the <project_home_dir>/reports directory, and that file calls load.project(), the reason this fails is that load.project() expects to be called from one directory higher.

I'm not super comfortable with using setwd as a solution for this, because doing so would be invisible to the user, and so would lead to headaches in the (unfortunate) case where the user is doing something that requires folder navigation.

Possible solution:

I did intend to try put a PR together for this, but I didn't get around to it.

chrissem commented 4 years ago

Hi @metanoid.

As a workaround you can set the knitr root.dir within a setup cell. If your .Rmd file is in <project_home_dir>/reports this works fine:

` ` `{r setup, include=FALSE, echo=FALSE}
require("knitr")
opts_knit$set(root.dir = "..")
` ` `
metanoid commented 4 years ago

Oh, this is clever, I like that! Thanks!

I think this is a good solution.

Hi @metanoid.

As a workaround you can set the knitr root.dir within a setup cell. If your .Rmd file is in <project_home_dir>/reports this works fine:

` ` `{r setup, include=FALSE, echo=FALSE}
require("knitr")
opts_knit$set(root.dir = "..")
` ` `
davidski commented 4 years ago

Could this issue remain open? I use notebooks quite a bit and would like to cleanly use them in subdirectories with projecttemplate. Manually coding the working directory into each notebook is brittle and moves away from reproducibility. A solution that integrates rprojroot or similar pseudo-chroot functionality would be terrific!

metanoid commented 4 years ago

How about this:

`  `  `{r setup, include=FALSE, echo=FALSE}
require("knitr")
require(rprojroot)
proj_root = find_rstudio_root_file()
opts_knit$set(root.dir = proj_root)
`  `  `

That way I think you get the best of both worlds: @chrissem 's solution of using knitr 's built-in tools for directory handling, plus ProjectTemplate doesn't have to take an additional dependency, plus the reproducibility benefits of rprojroot. It means you'll have to put this boilerplate at the top of your notebooks, though.

davidski commented 4 years ago

That requires a lot of fiddling to existing and future notebooks for something that could be addressed by having ProjectTemplate loosen its assumptions about the current working directory, allowing notebooks as first class citizens and arguably being (IMHO) a more rugged design. It may not be the direction this project (no pun intended 😉 ) wishes to go, but having notebooks "just work" is a key roadblock for adopting ProjectTemplate in the orgs I work with.

Really do appreciate the dialog and suggestions!

KentonWhite commented 4 years ago

Thanks everyone. I really would like ProjectTemplate to support notebooks. At this point I think the way forward is temporarily changing the working directory for ProjectTemplate to load the data into memory. Once the data is loaded, change back to the original working directory.

I don't use Notebooks much myself. What I will do is create a branch. I'm thinking calling it Notebook, I will publish the branch information here. Those that are using Notebooks, if I can ask that you try the branch and let me know issues that you find. After I've had enough feedback on the branch I will merge it into ProjectTemplate and make an official release.

I probably have time late next week to get a working version posted. Between now and then please give me thoughts / comments in this thread.