Open NPJuncal opened 9 months ago
Hi @Pakillo and @Julenasti,
I want to start the documentation of the example dataset. But I have never done it.
Reading the R Packages (2e) book, it said that I have to create a new script at the R folder. Do I open a new script just like that, or is there a ROxygen tab I have to click to create a R script linked to the dataset? How does R know that that script refers to that dataset?
Could any of you made a example with one of the datasets? Just create the script, no need to do the documentation.
Hi Nerea, I have never documented a dataset but I have been reading and it seems very similar to documenting a function and simple. I understand that you have read this (https://r-pkgs.org/data.html) - just to know if we are consulting the same information. I think we need point 7.1 Exported data. Also consider "7.1.1 Preserve the origin story of package data", which is always nice to have for future improvements. To document it, I would use the template that they use, no? Or adapt one of their full scripts https://github.com/tidyverse/tidyr/blob/main/R/data.R
#' World Health Organization TB data
#'
#' A subset of data from the World Health Organization Global Tuberculosis
#' Report ...
#'
#' @format ## `who`
#' A data frame with 7,240 rows and 60 columns:
#' \describe{
#' \item{country}{Country name}
#' \item{iso2, iso3}{2 & 3 letter ISO country codes}
#' \item{year}{Year}
#' ...
#' }
#' @source <https://www.who.int/teams/global-tuberculosis-programme/data>
"who"
Answering your question, they say that you document the name of the dataset and save it in R/
.
There are two roxygen tags that are especially important for documenting datasets:
@format gives an overview of the dataset. For data frames, you should include a definition list that describes each variable. It’s usually a good idea to describe variables’ units here.
@source provides details of where you got the data, often a URL.
Never @export a data set.
Hi!
Yes, just put an .R script in the R folder (ie. together with the functions) documenting each dataset using Roxygen. Here you have an example, with a similar structure to that shown by Julen above.
You could have one R script within R folder documenting all datasets, or one script per dataset, as you prefer (if in doubt I think I'd go with the former option, ie. one script documenting all datasets, could be called datasets.R
or similar)
Then save each dataset (as rda?) with the corresponding name within the data
folder (could use usethis::use:data
)
I agree with Julen (and Hadley) it'd be good to show clearly the origin of the datasets, and any modifications you may have applied. If you are using the datasets exactly as they're available somewhere (with a doi),could just use the @source Roxygen tag as in the above example. If you are modifying the data somehow, then I'd save an R script downloading and modifying the data within the data-raw
folder
P.S. Apologies I've been a bit out of touch lately with too much stuff going on. I'll try to get a couple of days soon to focus on this pkg
I have documented the datasets. Someone should check it
Hi Nerea, Great job! here are some comments:
@description
to explain what it contains. Here an example: https://github.com/tidyverse/tidyr/blob/c6c126a61f67a10b5ab9ce6bb1d9dbbb7a380bbd/R/data.R#L3
Knowing very little about the topic, I'd appreciate indicating a little more about what blue carbon data you are talking about. Seagrass, salt marsh and mangrove?compression
means? https://github.com/EcologyR/BlueCarbon/blob/31c26323b3bffb64051be3622d2e58665911e61a/R/exampledata.R#L14maxd
https://github.com/EcologyR/BlueCarbon/blob/31c26323b3bffb64051be3622d2e58665911e61a/R/exampledata.R#L15C18-L15C63 age
means. sampling year or years since sampling? https://github.com/EcologyR/BlueCarbon/blob/31c26323b3bffb64051be3622d2e58665911e61a/R/exampledata.R#L20The title and description suggestion also applies to the second example.
external_distance
I don't understand this one very well but it's probably due to my lack of knowledge https://github.com/EcologyR/BlueCarbon/blob/31c26323b3bffb64051be3622d2e58665911e61a/R/exampledata.R#L40
There are at least two example datasets:
bluecarbon_data: df with example cores core_comp: df with field measurements example data to estimate compaction
task: -homogenize both df to have the same core ids and be able to use core_comp to estimate the compression of the cores in bluecarbon_data -document the example datasets