LieberInstitute / qsvaR

Quality Surrogate Variable Analysis for Degradation Correction
http://research.libd.org/qsvaR/
0 stars 2 forks source link

Package data #13

Closed joshstolz closed 2 years ago

joshstolz commented 2 years ago

will need help making the data in data/ available to the package so that the examples work.

lcolladotor commented 2 years ago

First create a script using usethis::use_data_raw(). There you'll import any (large) file you want, then subset it to make it as small as possible such that your examples make sense but that your package is under the 5 mb limit. So like, you might make an rse object that just has data for 100 genes across say 10 samples. Something like that. You can check the file size using lobstr::obj_size(). If you want the size in Mb use lobstr::obj_size() / 1024^2. Once you are happy with how small the data is, then save it using usethis::use_data() inside the R script you generated with usethis::use_data_raw().

Here are two such examples:

In that last example, the final object was still quite big, but well, that's why I ended up sharing the data through ExperimentHub http://bioconductor.org/packages/release/bioc/html/ExperimentHub.html. That's the solution if you need a large example file for your code to work. That route involves going through http://bioconductor.org/packages/release/bioc/vignettes/HubPub/inst/doc/CreateAHubPackage.html and you want to avoid this as much as possible. Another solution is to use some already publicly available data from AnnotationHub/ExperimentHub or other packages like http://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html https://github.com/lmweber/STexampleData, https://github.com/helenalc/TENxVisiumData.

https://usethis.r-lib.org/reference/use_data.html

@lahuuki: this also applies to DeconvoBuddies and other packages.

lcolladotor commented 2 years ago

See https://github.com/LieberInstitute/DeDHed/commit/20c39c0b4935a1e4d024c624ea317b9c814427e0#r61118091 for some notes about the R script generated by use_data_raw().

lcolladotor commented 2 years ago

We can see at https://github.com/LieberInstitute/DeDHed/runs/4369810545?check_suite_focus=true#step:22:123 that we didn't document the actual data. To do so, we need a file like https://github.com/lcolladotor/derfinder/blob/master/R/genomeFstats-data.R that has the roxygen2 syntax for documenting the dataset we included.

See also https://r-pkgs.org/data.html#documenting-data

joshstolz commented 2 years ago

have leo check https://github.com/LieberInstitute/qsvaR/blob/30d00a08c77759b65f24b9ef1f85f265610ac440/R/covComb_tx_deg-data.R#L14 to see if this resolves the issue

joshstolz commented 2 years ago

resolved line in comment above.