cran-task-views / WebTechnologies

CRAN Task View: WebTechnologies
https://CRAN.R-project.org/view=WebTechnologies
174 stars 66 forks source link

new "Direct Data Download and Ingestion" section #501

Open wibeasley opened 1 year ago

wibeasley commented 1 year ago
pachadotdev commented 1 year ago

@wibeasley hi! do you prefer yaml?

wibeasley commented 1 year ago

Yes, I typically prefer yaml if the (a) data has a nested or non-rectangular structure and (b) the file is a human entered/edited. I tend to use json for machine-generated datasets.

But there are some tabular/rectangular files that I have started expressing as yaml because they're easier to read & adjust. A small downside is that it requires a little more work (for the ingesting code) to verify the yaml politely transforms to a data.frame.

Here's an example of a tabular structure that I felt was a better fit for yaml than csv: https://github.com/OuhscBbmc/REDCapR/blob/main/inst/misc/validation-transformation.yml

I don't do it much, but the yaml package can load a file from a https url:

yaml::yaml.load_file(
  "https://raw.githubusercontent.com/OuhscBbmc/REDCapR/main/inst/misc/validation-transformation.yml"
)

Since we already have bullets for csv, xml, html, & json ...I thought yaml could be included for completeness. But as always, I'm happy following your lead. Tell me if you think tangents like this are more distracting than helpful.

Are there scenarios where you do/don't format a data file as yaml?