Open mbannert opened 4 years ago
We could do dataset_create(data, meta)
, with the code `dataset_read():
z <- list(
meta = dots_to_underscore(empty_list_to_null(meta)),
data = data,
set_id = gsub(".", "_", set_id, fixed = TRUE)
)
names(z$meta) <- gsub("utc_updated", "updated_utc", names(z$meta), fixed = TRUE)
class(z) <- "swissdata"
if (test) ans <- dataset_validate(z)
And then perhaps another function, meta()
, where each element gets its own argument? This would create the list that is supplied to dataset_create
as the meta
argument.
I think that's definitely going into the right direction. I like the idea to create swissdata objects that way and to validate them is perfect – would also make a perfect test. That and a good vignette might already do the job.
I've been thinking about a potential meta()
function and I am not sure whether that's rather a skeleton
approach like in the original swissdata
package or a ...
type of function.
Consider to swissdatify
one of these new daily datasets that are around and have been popular lately.
six <- fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv")
#> Error in fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv"): could not find function "fread"
metadata_six <- list(
"title" = list(en = "SIX Debit and Credit Card Use"),
"source.name"= list(en = "SIX"),
"source.url" = "https://github.com/statistikZH/covid19monitoring_economy_SIX",
dim.order = c("variable"),
hierarchy = list(
variable = list(
"stat_einkauf" = NA,
"bezug_bargeld" = NA,
"stat_einkauf" = NA
)
),
labels = list(
dim.names = list(
variable = list(
en = "variable"
)
),
debiteinsatz_ausland = list(
en = "Volume Debitcard use abroad",
de = " Finanzvolumen Debitkarteneinsatz im Ausland"
),
bezug_bargeld = list(
en = "Volume Cash Withdrawal Switzerland",
de =" Finanzvolumen Bargeldbezug Debitkarten in der Schweiz"
),
stat_einkauf = list(
en = "Volume debit card use in retatil (w/o online)",
de = "Finanzvolumen Debitkarteneinsatz stationärer Einkauf in der Schweiz (kein Online-Handel)"
)
),
details = list(
en = "Die Daten von SIX Payment Services umfassen bargeldlose Transaktionen und Bargeldbezüge im In- und Ausland, für welche von Schweizer Banken ausgehändigte Debitkarten der folgenden Marken verwendet wurden: Debit Mastercard, Maestro CH, V PAY oder Visa Debit."
),
utc.updated = Sys.time()
)
Created on 2020-04-11 by the reprex package (v0.3.0)
How could we make a function out of this? Maybe we make it multiple step process:
sd_data
object to a meta()
function which returns a list with the standard elements like title and other must haves + stuff derived from the data columns, maybe have dim order parameter. sd_meta
object + sd_data
object together in a swissdata
class. Besides I like the idea to also think about I/O here. How about a swissdata
class to .zip file function / option.
(I am using data
and meta
where you are using the prefixed version)
Yes, I like your second step: data
defines the minimal structure for meta
and fills it with placeholders, or ids instead of labels. It then need to be filled in. E.g., meta <- meta_minimal(data)
.
How to edit meta
is a separate question. Either in R, by manipulating the list. Or by editing YAML or JSON. Or by a supercharged version of dput()
for lists that prints meta
like your R code above.
Which one may depend on the user need an so it is ok to leave that open.
Practical swissdata experience has shown that defining meta information in strings – no matter whether it is .json or .yaml is not very intuitive. For an R person the natural way to define a hierarchical structure is a list. Also because indent an code highlighting works so well as opposed to json in R Studio.