bcgov / bcgov-data-science-resources

Collection of resources like tools, tutorials and tips for doing data science in bcgov
Other
32 stars 3 forks source link

Strategy for distributing vignettes #3

Open ateucher opened 6 years ago

ateucher commented 6 years ago

In general when hosting packages on GitHub, we have a problem with how to best distribute vignettes. If we can come up with a good working solution here, I propose we add a bit to the wiki to describe it.

For this discussion I'm assuming vignettes are created as .Rmd files in the vignettes/ directory, with a .yaml header that looks like:

---
title: "Vignette Title"
author: "Vignette Author"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Vignette Title}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

(The easiest way to create one is with usethis::use_vignette("vignette-title"))

The issue is that remotes::install_github() doesn't build vignettes automatically, so when a user installs a package from github, they can't run browseVignettes("pkgname") to access them. This is different to packages hosted on CRAN, where vignettes are pre-built and included when you run install.packages("cranpkgename").

Possible solutions:

  1. In the installation section of the package README, make the instructions:

    install_github("bcgov/pkgname", build_opts = c("--no-resave-data", "--no-manual"))

    This works because the default of build_opts argument is c("--no-resave-data", "--no-manual", "--no-build-vignettes"), so the above removes the "--no-build-vignettes" flag, hence vignettes are built.

    From the developer perspective, this is easiest, as we don't have to do anything extra. From a user perspective however, I think it's less optimal - once a user becomes used to using install_github(), just typing install_github("bcgov/pkgname") becomes reflexive and they probably won't even look at the installation instructions in the README.

    The other downside of this is that if the vignettes are resource intensive to build, or require local files/infrastructure that aren't committed to the repo it will be either really slow, or won't work at all.

  2. Build the vignettes and place the built artifacts (.Rmd, .R, .html) in inst/doc/. Then the built vignettes are included in the package and install_github("bcgov/pkgname") will just work for the user. This is how it is in bcgroundwater.

    The downside of this is that putting those files in inst/doc/ is a manual process that a developer will have to remember to do each time they update the package and/or the vignettes. In addition, devtools::install() will actually delete these files because it's now considered not good practice so we might be continuously fighting ourselves. We could write some functions (in bcgovr?) to help automate this...

  3. Start to use the bcgov drat repository to host packages. Then the developer runs devtools::build() and deploys the built tar.gz to the drat repository (we could again add functions to automate this... I'm thinking bcgovr::deploy_drat()). Users then run install.packages("pkgname", repos = "https://bcgov.github.io/drat/") (or better yet has "https://bcgov.github.io/drat/" in their repos list in their .Rprofile) and it all works.

    The obvious downside is more overhead for developers. An additional upside (unrelated to vignettes) is that we have more control over package releases (i.e., separation of development in GitHub and versioned releases in drat). Also, users will be informed of package updates as they currently are for CRAN packages.

Whew! Did I miss anything? @stephhazlitt @boshek @jongoetz

Obvious

stephhazlitt commented 6 years ago

Is pkgdown another possible solution?

ateucher commented 6 years ago

Oh yes, it probably is! It doesn't make the vignettes available locally (i.e., with browseVignettes()) but solves the problem in a different way...

stephhazlitt commented 6 years ago

I wonder how many users use browseVignettes()? I know I learned about that function-feature many times before I internalized it and then it took me a lot more time to use it in a reflex kind of way? For me, the ideal outcome is a rendered html for users to read the vignettes—how ppl find the url is maybe secondary? (e.g. browseVignettes() or a link in the package README or on the pkgdown-generated package "website"?)

But to view a rendered html vignette it needs to be hosted in the package on GitHub, so maybe back to the friction with devtools::install()?

ateucher commented 6 years ago

They're probably more likely to discover them that way. Especially if we put links to the pkgdown site in the README, DESCRIPTION, and package-level help

ateucher commented 6 years ago

It's not necessarily an either-or situation, but I do like the idea of pkgdown

stephhazlitt commented 6 years ago

Me too.

FWIW, I think I like option 3 of the initial set—although I realize that increases the developer overhead quite a bit, and I wonder if it also increases variation of R package delivery across bcgov teams (e.g. some in drat, some not, some with stable version on master, some not....).

stephhazlitt commented 6 years ago

Some nice, local pkgdown examples:

ateucher commented 6 years ago

Agreed. I think drat is the "right way" to manage releases... however you're right, It would increase variation in package delivery... and would there be the desire/expectation for that repository to become the for all bcgov-created packages? if so, what would management of that look like? I think we could create tooling to make the developer overhead not too bad.

boshek commented 6 years ago

+1 for pkgdown with some obvious hurdles to clear first