benmarwick / rrtools

rrtools: Tools for Writing Reproducible Research in R
Other
671 stars 85 forks source link

Dockerfile needs to be aware of where we choose to put the /analysis dir #38

Closed benmarwick closed 7 years ago

benmarwick commented 7 years ago

If we choose /vignettes for the analysis location, we need the dockerfile to be aware of that choice. Currently it's hard-coded to top-level /analysis, so the container wont find vignettes/analysis, and wont build, and travis will fail, cf https://github.com/mrecos/signboardr

mrecos commented 7 years ago

Though about this a little. Three obvious routes:

  1. location parameter in use_dockerfile
  2. use_dockerfile scans directory and chooses a template based on the structure it finds
  3. use_dockerfile scans directory and modifies final line in default template to match paper directory.
  4. set a variable in pkg object to say what file structure was chosen.

Option 1 allows no room for user error and is not a very cohesive approach. Option 2 is a little overkill, but is rather straightforward. Option 3 may not be an option because whisker.render is doing the work. Option 4 may be the easiest, but I just thought of it, so no testing yet...

An approach to finding which template to chose for option 2 could look something like:

dir_list   <- list.dirs()
paper_dir  <- dir_list[grep(pattern = "/paper", dir_list)]
location   <- regmatches(paper_dir, regexpr("analysis|vignettes|inst", paper_dir))

returns either "analysis", "vignettes", or "inst". However, it could use some defensive checks in case the user has a directory structure that differs from these options. Just some thoughts...

benmarwick commented 7 years ago

Thanks Matt, that's a good survey of options, @nevrome, what are your thoughts on this?

nevrome commented 7 years ago

@mrecos Thanks for the analysis - sounds pretty solid!

I didn't look at use_dockerfile() for a long time, but I think a combination between method 1 and 2 is the way to go. use_dockerfile() is a powerful wizard, but more often than not you'll be forced to check the travis.yml manually anyway (the README already mentions debugging here). Method 1 allows a manual setup for experienced users and can be combined with a good default via 2.

I don't understand how method 4 is supposed to work. What is a package object? Are you talking about a custom config file to store variables? Or maybe some magic that's possible with the settings package? I'm genuinely interested in this approach!

benmarwick commented 7 years ago

Thanks @nevrome, I've pushed an update that follows @mrecos' option 3, since the whisker formatting works fine in this context.

I've also exposed an argument so the user can set the path to the Rmd they want to render in the Docker container, if case they don't use one of the three options we provide, or their Rmd have better names than paper.Rmd (quite likely). I can't think of a way we can reliably automate that (at least until the mindreadr pkg is released ;)

Let me know what you both think.

mrecos commented 7 years ago

@benmarwick Looks good! Just tested it out and it worked fine for the existing options of analysis, inst, or top_level (default). One thing that I was unsure of (without looking at docs... RTFM) was the format of the file path ascribed to rmd_to_knit. Aside from making the user RTM, there could be a little defensive code. This example for https://github.com/benmarwick/rrtools/blob/645e5f2279b7e286679a61b60f6047b2fae71eca/R/hello.R#L279 could check for an attempt to preempt the string with home directory notation or back-slash. Just a suggestion.

} else {
rmd_path <- gsub("^.|^/|^./|^~/","",rmd_to_knit)
}

@nevrome thanks for considering those options! As stated, Option 4 was just on a whim. My thought was that in use_analysis() you could add a slot (ex. $location) to the pkg object of class package as generated by as.package(). https://github.com/benmarwick/rrtools/blob/645e5f2279b7e286679a61b60f6047b2fae71eca/R/hello.R#L197 So when pkg is assigned in use_dockerfile() you would already know what location option is with location <- pkg$location; no need to search the directory for paper.Rmd. I have not tested it, but I don't think the package class will object to a new $location slot being added. This could be called if rmd_to_knit == "path_to_rmd", otherwise the custom path is used. Could that work?

benmarwick commented 7 years ago

Thanks Matt, I've added that nice bit of defensiveness in d5e0c4c3c407e3e539115741744c88219e29842e

I'm not sure that we can take a location value in pkg from use_analysis() to use it in use_dockerfile(). The object pkg only exists within the function, and is generated independently in each function, so currently we have no way to pass values from use_analysis() touse_dockerfile(), if I understand you correctly. I'm not too worried about that, since we can't be sure exactly which Rmd the user wants to render in the docker container, in the case that they have several scattered across the file structure. So I think it's ok to let the user choose what Rmd to knit in use_dockerfile via rmd_to_knit =

nevrome commented 7 years ago

Thanks for the explanation concerning the pkg variable solution. But I guess Ben is right -- that's not going to work with the current setup.

I like the solution Ben implemented now. Just two more thoughts:

  1. Doesn't the same problem also arise not just in the Dockerfile, but also in the travis-setup without Docker? There's also a hard coded link to analysis/paper.Rmd.

https://github.com/benmarwick/rrtools/blob/d5e0c4c3c407e3e539115741744c88219e29842e/inst/templates/travis.yml-no-docker#L21

  1. Ben, you mention the possibility that a user has several documents to be rendered within his project. There's probably no way to prepare a solution that's working for every case, but maybe rmd_to_knit = could be configured to deal with a character vector of file paths. I don't know how this would fit to the template system.
mrecos commented 7 years ago

@nevrome & @benmarwick thanks for the information on the creation of pkg. I had not fully investigated it, but thought there may be some underlying persistent data structure for overall package characteristics. Good points in the comment above.

benmarwick commented 7 years ago

@nevrome thanks for the reminder about the travis.yml-no-docker, I think I've dealt with that now in https://github.com/benmarwick/rrtools/commit/a4b934edb86a1b9558e15e61be0088c8a9d35d90