EcologyR / BlueCarbon

Estimation of organic carbon stocks and sequestration rates from soil/sediment cores from blue carbon ecosystems
https://ecologyr.github.io/BlueCarbon/
Other
2 stars 0 forks source link

example datasets #42

Open NPJuncal opened 9 months ago

NPJuncal commented 9 months ago

There are at least two example datasets:

bluecarbon_data: df with example cores core_comp: df with field measurements example data to estimate compaction

task: -homogenize both df to have the same core ids and be able to use core_comp to estimate the compression of the cores in bluecarbon_data -document the example datasets

NPJuncal commented 8 months ago

Hi @Pakillo and @Julenasti,

I want to start the documentation of the example dataset. But I have never done it.

Reading the R Packages (2e) book, it said that I have to create a new script at the R folder. Do I open a new script just like that, or is there a ROxygen tab I have to click to create a R script linked to the dataset? How does R know that that script refers to that dataset?

Could any of you made a example with one of the datasets? Just create the script, no need to do the documentation.

Julenasti commented 8 months ago

Hi Nerea, I have never documented a dataset but I have been reading and it seems very similar to documenting a function and simple. I understand that you have read this (https://r-pkgs.org/data.html) - just to know if we are consulting the same information. I think we need point 7.1 Exported data. Also consider "7.1.1 Preserve the origin story of package data", which is always nice to have for future improvements. To document it, I would use the template that they use, no? Or adapt one of their full scripts https://github.com/tidyverse/tidyr/blob/main/R/data.R

#' World Health Organization TB data
#'
#' A subset of data from the World Health Organization Global Tuberculosis
#' Report ...
#'
#' @format ## `who`
#' A data frame with 7,240 rows and 60 columns:
#' \describe{
#'   \item{country}{Country name}
#'   \item{iso2, iso3}{2 & 3 letter ISO country codes}
#'   \item{year}{Year}
#'   ...
#' }
#' @source <https://www.who.int/teams/global-tuberculosis-programme/data>
"who"

Answering your question, they say that you document the name of the dataset and save it in R/.

There are two roxygen tags that are especially important for documenting datasets:

Never @export a data set.

Pakillo commented 8 months ago

Hi!

Yes, just put an .R script in the R folder (ie. together with the functions) documenting each dataset using Roxygen. Here you have an example, with a similar structure to that shown by Julen above.

You could have one R script within R folder documenting all datasets, or one script per dataset, as you prefer (if in doubt I think I'd go with the former option, ie. one script documenting all datasets, could be called datasets.R or similar)

Then save each dataset (as rda?) with the corresponding name within the data folder (could use usethis::use:data)

I agree with Julen (and Hadley) it'd be good to show clearly the origin of the datasets, and any modifications you may have applied. If you are using the datasets exactly as they're available somewhere (with a doi),could just use the @source Roxygen tag as in the above example. If you are modifying the data somehow, then I'd save an R script downloading and modifying the data within the data-raw folder

P.S. Apologies I've been a bit out of touch lately with too much stuff going on. I'll try to get a couple of days soon to focus on this pkg

NPJuncal commented 8 months ago

I have documented the datasets. Someone should check it

Julenasti commented 8 months ago

Hi Nerea, Great job! here are some comments:

The title and description suggestion also applies to the second example.