franknarf1 / r-tutorial

This book covers the essentials of using R
Creative Commons Zero v1.0 Universal
12 stars 4 forks source link

more file ops stuff #29

Open franknarf1 opened 6 years ago

franknarf1 commented 6 years ago

Eg, to read tar.gz

fread_targz = function(fp, zp = "C:/Program Files/7-Zip/7z"){
  # only tested on windows
  # fp should be the path to mycsv.tar.gz
  # zp should be the path to 7z.exe
  # thanks to Joachim Sauer: https://superuser.com/a/1283392/

  qzp = shQuote(zp)
  qfp = shQuote(fp)

  patt = "%s x -so %%s | %1$s x -si -so -ttar"

  thecall = patt %>% sprintf(qzp) %>% sprintf(qfp)
  cat("The call:", thecall, sep="\n")

  fread(thecall)
}

fread_targz("C:/Users/ferickson/Downloads/mycsv.tar.gz")

Fits ahead of "Tables > Input and output > Reading and writing other formats".

franknarf1 commented 6 years ago

Since it took me ages to get this to work (windows-only, requires 7z), here's the multifile version:

fread_targzs = function(fp, zp = "C:/Program Files/7-Zip/7z.exe", unzip_dir = dirname(fp), silent = FALSE){
  # only tested on windows
  # fp should be the path to mycsvs.tar.gz
  # zp should be the path to 7z.exe
  # unzip_dir should be used only for CSVs from inside targz

  # thanks to Joachim Sauer: https://superuser.com/a/1283392/
  # see original single-csv function on https://github.com/franknarf1/r-tutorial/issues/29

  qzp = zp %>% normalizePath(mustWork = FALSE) %>% shQuote
  qfp = fp %>% normalizePath %>% shQuote
  quz = unzip_dir %>% normalizePath %>% shQuote

  # unzip

  patt = "%s x -y -so %%s | %1$s x -y -si -ttar -o%%s"
  thecall = patt %>% sprintf(qzp) %>% sprintf(qfp, quz)
  if (!silent) cat("The targz unzip call:", thecall, sep="\n")

  shell(sprintf("\"%s\"", thecall))

  # list files, read separately
  # not looking recursively, since csvs should be only one level deep
  # still need to discuss conventions like this with data team
  fns = list.files(unzip_dir) %>% setNames(., .)

  if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")

  lapply(fns %>% file.path(unzip_dir, .), fread)
}