STAT545-UBC / Discussion

Public discussion
38 stars 20 forks source link

Using rprojroot #407

Open samhinshaw opened 7 years ago

samhinshaw commented 7 years ago

This came up talking to a student in class today and I just wanted to post a snippet on how to use relative paths within RStudio to make your code more accessible to collaborators. Rather than using getwd() and setwd(), the R package rprojroot helps you manage your working directory based on relative paths. For most of us, using GitHub and RStudio, this refers to our Rproj files.

  1. Set our root directory. This is the most important part of this puzzle, because it takes our relative paths and finds the absolute path on the computer you're running on. So this code would work for me, on linux, or a collaborator on Mac or Windows.

    library(rprojroot) # install.packages("rprojroot")
    rootDir <- rprojroot::find_rstudio_root_file()
    [1] "/home/shinshaw/stat545/scratchpad"
  2. Set our subdirectories to access later on. Here I like to use file.path(), which is essentially a filesystem-friendly wrapper for paste().

    inputDir <- file.path(rootDir, "input")
    outputDir <- file.path(rootDir, "output")
    figuresFolder <- file.path(rootDir, "output", "figures")
    [1] "/home/shinshaw/stat545/scratchpad/output/figures"
  3. Perform some basic operations using the directories we've defined.

    MyData <- read.csv(file.path(inputDir, "data.csv"))
    write.csv(file.path(outputDir, "summary.csv"))
    ggsave(file.path(figuresFolder, "summary.png"))

This is a very cursory glance at the functionality rprojroot offers, but honestly, rprojroot::find_rstudio_root_file() is the only function I use from the package, and may be all you need as well! Check out the manual for more functionality and feel free to ask any questions below.

jennybc commented 7 years ago

If you are willing to use something off of GitHub, vs CRAN, I highly recommend the even simpler wrapper package here that hides a bit of the rprojroot machinery.

Here is the top bit of a script where I use it, commented:

# install_github("krlmlr/here")
library(here)
# load other packages ...

hw_dirs <- list.files(here("hw-marking"), pattern = "^hw\\d{2}_.*")

The here() function will help you build paths relative to "here", defined as the directory corresponding to the RStudio project's top-level directory or to the top-level directory of current Git repo. In our case, those generally coincide, which is how I recommend working in general. You can use here() anywhere in the directory tree and a path will get built at runtime relative to the top level project directory.

This is a great way to use subdirectories, use Rmd, and use RStudio for development without pulling your hair out and without manually fiddling with working directory. Your code is also more portable to other people's computers. We use this approach over in the private repo the TAs and I use for course stuff.

ralfuh commented 7 years ago

I have a related question to this. I'd been unwittingly (ab)using the knitr/rmarkdown assumption that the working directory is the execution directory of the script/file, and using paths like "./subdirectory/some_file_i_need". Now I'm seeing that scripts don't make this same assumption from within Rstudio, although they do when using R from the shell or when running them with Rscript.

I'm wondering if there's a simple way to get this functionality to work consistently across all of these different possible use cases that doesn't involve extra libraries, or the dependence on R studio? This is something that I find to be really useful for relative paths with small projects that have flat-ish file structure.

I found this solution,

script_dir <- dirname(sys.frame(1)$ofile)

but it only works when the file is sourced.

jennybc commented 7 years ago

I'm wondering if there's a simple way to get this functionality to work consistently across all of these different possible use cases that doesn't involve extra libraries, or the dependence on R studio?

Short answer: no.

And dramatic measures re: RStudio also won't solve this, i.e. always using it or never using it. The fundamental working directory tension comes from knitr. So the only way to remove this problem from your life completely is to never use knitr or never use subdirectories.

I really have thought about and wrestled with this a lot. And my considered advice is what I said above: use here or the fancier package powering it, rprojroot.

sjackman commented 7 years ago

My workflow is to assume that the current working directory of every script is the root directory of the project, regardless of the script's location. To be a defensive programmer, use

stopifnot(dir.exists(".git"))
jennybc commented 7 years ago

But @sjackman that will not save you when knitr takes over and forces working directory during render to be directory where .R or .Rmd lives. That is, your stopifnot() will catch the problem but not address it. Or am I misunderstanding you?

samhinshaw commented 7 years ago

For that problem, might I recommend @daattali's package ezknitr? 😁

daattali commented 7 years ago

@ralfuh When I first ran into this issue, I also tried to do fancy things to figure out where I am, using $ofile and also some other more obscure tricks. That's what eventually led me to make ezknitr. For your specific problem you're describing, do take a look at it (thanks Sam)

sjackman commented 7 years ago

I am :100: in favour of using ezknitr, which solves exactly this problem. For interest and reference only, I believe the following low-tech solution works, when you know that your rmd file is one-level deep in a subdirectory of the project root.

```{r setup}
knitr::opts_knit$set(root.dir = "..")
getwd()
stopifnot(dir.exists(".git"))
ralfuh commented 7 years ago

Ok thanks for the input/suggestions everyone, this was all very helpful!