StatisticsGreenland / pxmake

pxmake: Read and write px files in R.
https://statisticsgreenland.github.io/pxmake/
Other
6 stars 3 forks source link

parquet as supplementary file-format in px_save #309

Open larpSTATGL opened 4 weeks ago

larpSTATGL commented 4 weeks ago

As supplementary to: px_save(x,data_path = "hi.rds","dta12871.xlsx")

can we have, parquet? px_save(x,data_path = "hi.parquet","dta12871.xlsx")

johan-ejstrud commented 3 weeks ago

Should be possible yes, however it ads a dependency on the 'arrow' package.

Another possibility is to write a small function that converts the .rds file to .parquet, something like:

change_rds_to_parquet <- function(path) {
  readRDS(path) %>% 
    arrow::write_parquet(path = gsub(".rds", ".parquet", path))

  unlink(path)
}

rds_path <- "hi.rds"

px_save(x, data_path = rds_path, "dta12871.xlsx")
change_rds_to_parquet(rds_path)

(I haven't tested the code)

larpSTATGL commented 3 weeks ago

one good point for parquet-files is, that the arrow package allows us to read and write the files. There is an Excel-add in also to read and write parquet and our sas environment (Altair analytics workbench has an engine to read and write. I think dependency could ease combining data from production platforms. rds is sealed in R. Is dependency to arrow a problem?

johan-ejstrud commented 2 weeks ago

Is dependency to arrow a problem?

It's not a problem. However, the more packages are added, the heavier the package is to install, and it increases the risk that it will break in the future, because a package it depends on changes behaviour, catches a bug etc.

Therefore I just want to make sure, that every dependency added is a conscious choice.

As long as we add well established packages like 'arrow', we will probably be fine. 😁