cynkra / sdtools

tools to work with swissdata sets
1 stars 0 forks source link

function to create meta information from scratch #22

Open mbannert opened 4 years ago

mbannert commented 4 years ago

Practical swissdata experience has shown that defining meta information in strings – no matter whether it is .json or .yaml is not very intuitive. For an R person the natural way to define a hierarchical structure is a list. Also because indent an code highlighting works so well as opposed to json in R Studio.

christophsax commented 4 years ago

We could do dataset_create(data, meta), with the code `dataset_read():


  z <- list(
    meta = dots_to_underscore(empty_list_to_null(meta)),
    data = data,
    set_id = gsub(".", "_", set_id, fixed = TRUE)
  )
  names(z$meta) <- gsub("utc_updated", "updated_utc", names(z$meta), fixed = TRUE)

  class(z) <- "swissdata"

  if (test) ans <- dataset_validate(z)

And then perhaps another function, meta(), where each element gets its own argument? This would create the list that is supplied to dataset_create as the meta argument.

mbannert commented 4 years ago

I think that's definitely going into the right direction. I like the idea to create swissdata objects that way and to validate them is perfect – would also make a perfect test. That and a good vignette might already do the job.

I've been thinking about a potential meta() function and I am not sure whether that's rather a skeleton approach like in the original swissdata package or a ... type of function.

Consider to swissdatify one of these new daily datasets that are around and have been popular lately.


six <- fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv")
#> Error in fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv"): could not find function "fread"

metadata_six <- list(
  "title" = list(en = "SIX Debit and Credit Card Use"),
  "source.name"= list(en = "SIX"),
  "source.url" = "https://github.com/statistikZH/covid19monitoring_economy_SIX",
  dim.order = c("variable"),
  hierarchy = list(
    variable = list(
      "stat_einkauf" = NA,
      "bezug_bargeld" = NA,
      "stat_einkauf" = NA
    )
  ),
  labels = list(
    dim.names = list(
      variable = list(
        en = "variable"
      )
    ),
    debiteinsatz_ausland = list(
      en = "Volume Debitcard use abroad",
      de = " Finanzvolumen Debitkarteneinsatz im Ausland"
    ),
    bezug_bargeld = list(
      en = "Volume Cash Withdrawal Switzerland",
      de =" Finanzvolumen Bargeldbezug Debitkarten in der Schweiz"
    ),
    stat_einkauf = list(
      en = "Volume debit card use in retatil (w/o online)",
      de = "Finanzvolumen Debitkarteneinsatz stationärer Einkauf in der Schweiz (kein Online-Handel)"
    )
  ),
  details = list(
    en = "Die Daten von SIX Payment Services umfassen bargeldlose Transaktionen und Bargeldbezüge im In- und Ausland, für welche von Schweizer Banken ausgehändigte Debitkarten der folgenden Marken verwendet wurden: Debit Mastercard, Maestro CH, V PAY oder Visa Debit."
  ),
  utc.updated = Sys.time()
)

Created on 2020-04-11 by the reprex package (v0.3.0)

How could we make a function out of this? Maybe we make it multiple step process:

  1. create a long format dataset using a process like the one you suggested above.
  2. pass the new sd_data object to a meta() function which returns a list with the standard elements like title and other must haves + stuff derived from the data columns, maybe have dim order parameter.
  3. modify the list, the existing functions are likely good enough already.
  4. put sd_meta object + sd_data object together in a swissdata class.

Besides I like the idea to also think about I/O here. How about a swissdata class to .zip file function / option.

christophsax commented 4 years ago

(I am using data and meta where you are using the prefixed version)

Yes, I like your second step: data defines the minimal structure for meta and fills it with placeholders, or ids instead of labels. It then need to be filled in. E.g., meta <- meta_minimal(data).

How to edit meta is a separate question. Either in R, by manipulating the list. Or by editing YAML or JSON. Or by a supercharged version of dput() for lists that prints meta like your R code above.

Which one may depend on the user need an so it is ok to leave that open.