LieberInstitute / TREG

Tools for finding Total RNA Expression Genes in single nucleus RNA-seq data
http://research.libd.org/TREG/
4 stars 2 forks source link

Create the vignette #7

Closed lcolladotor closed 2 years ago

lcolladotor commented 2 years ago

This is going to be the longest one (well, maybe #6).

Create vignette skeleton

You can do that with biocthis, particularly with https://github.com/LieberInstitute/TREG/blob/main/dev/03_core_files.R#L74 https://lcolladotor.github.io/biocthis/reference/use_bioc_vignette.html.

I believe that we only need one vignette for TREG, though in the future if add functions for say HALO, we might want to add a second one. If that's the case, you might want the vignette title to start with a number, so it'll appear ordered like at http://bioconductor.org/packages/release/bioc/html/GenomicRanges.html.

Update list of citations

Once you have a template, update the list of packages we cite. Here I typically cite every single package mentioned on the DESCRIPTION file. That involves updating the equivalent of https://github.com/lcolladotor/biocthis/blob/master/vignettes/biocthis.Rmd#L41-L58 and https://github.com/lcolladotor/biocthis/blob/master/vignettes/biocthis.Rmd#L321-L335. Note the difference between BiocStyle::Biocpkg() and BiocStyle::CRANpkg().

Manually cite any papers

See https://github.com/LieberInstitute/recount3/blob/master/vignettes/recount3-quickstart.Rmd#L55-L61 and https://github.com/LieberInstitute/qsvaR/blob/main/vignettes/Intro_qsvaR.Rmd#L42-L53 for example. You'll need to get the BibTex information so you can then copy-paste that info into your RefManageR::BibEntry() call. You can cheat around with knitcitations::bib_metadata() though it's best to avoid depending on knitcitations given that it depends on bibtex and that package was removed from CRAN. Or something like that: I don't remember all the details of the story. @joshstolz just did this for qsvaR so he can help you too and we used the output of knitcitations::bib_metadata('doi.org/10.1016/j.neuron.2019.05.013').

Locate data to use

Ideally, we need to make sure that the package doesn't need lots of memory for building the vignette. GitHub Actions allows us to use up to 7 GB of RAM https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources. You can compute how much memory an object uses with lobstr::obj_size(). I typically do lobstr::obj_size() / 1024^3 to get the amount in GB instead of bytes.

If there's data available through ExperimentHub then that's the ideal situation from Bioconductor's point of view. Though well, I think that we should be ok using the AWS-hosted data from https://github.com/LieberInstitute/10xPilot_snRNAseq-human#processed-data.

Download data

If that's the case (AWS-hosted data), then you should use BiocFileCache to download it. See https://github.com/LieberInstitute/spatialLIBD/blob/master/vignettes/TenX_data_download.Rmd#L127-L139 for example (we used paste0() for formatting the long URL). Basically it involves using BiocFileCache() and bfcrpath(). Here's a recent example https://github.com/LieberInstitute/qsvaR/blob/main/vignettes/Intro_qsvaR.Rmd#L122-L124 by @joshstolz.

Writing guides

I recommend including code comments in every code chunk, since some users might just jump to the code and ignore the surrounding text. Conversely, document/describe every code chunk in the text too. It'll feel a bit redundant, but well, you can think of it as having two chances to explain a concept to your users. In particular with the code chunks, I like to describe any non-default argument value we use. Or well, if you go beyond that, explain what each argument is. I also tend to name the arguments in the function calls instead of relying on default argument positions in the function calls (though I don't always do this).

Kind of like a paper, you'll want to highlight limitations at the end but also ways in which this package could help spur the development of new tools or answer other questions. Plus well, a bit of a summary of what you did.

Avoid saying something is obvious/easy :P I guess along those lines, link to any background that might be useful to others. The template vignette includes a section for that.

Suggested packages

In a vignette, feel free to use other CRAN/Bioconductor packages, like for example ggplot2 if you are including a plot. Any such packages should be under the Suggests section in the DESCRIPTION file. You can add them with usethis::use_package("pkgName", type = "Suggests"). Remember to cite them too. To help me with this, I add packages in alphabetical order to my list of citations.

Citing the package vs the paper

You'll likely have 2 citations at #1 and if that's the case, you'll want to use something like https://github.com/LieberInstitute/spatialLIBD/blob/master/vignettes/TenX_data_download.Rmd#L62-L63 so you can differentiate when you are citing the R package and when you are citing the paper describing the package.

Developing the vignette

I recommend that you make sure that GitHub actions works with the template version before you move on to adding other parts. A vignette is long, and well, you'll likely need to fix errors on it along the way. Since every git push triggers a new GitHub Actions run, I recommend git pushing for every commit instead of a group of commits, so you'll be able to pin point more easily what broke (if anything breaks).

lahuuki commented 2 years ago

Getting this error when vignette builds

Quitting from lines 18-31 (how_to_find_tregs.Rmd) 
Error: processing vignette 'how_to_find_tregs.Rmd' failed with diagnostics:
there is no package called 'RefManageR'
lcolladotor commented 2 years ago

Do you have it on your suggests on the DESCRIPTION file? If not, use usethis::use_package("RefManageR", type = "Suggests")

lcolladotor commented 2 years ago

Never mind, I see you added them. But now you are missing another package: BiocFileCache https://github.com/LieberInstitute/TREG/runs/5234108152?check_suite_focus=true#step:17:42.

lahuuki commented 2 years ago

Yeah found a whole string of packages I needed to add to the DESCRIPTION file