KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
622 stars 159 forks source link

IDEA: Use fst package for loading Cache will boost startup performance #191

Closed Fpadt closed 6 years ago

Fpadt commented 7 years ago

Dear All,

Who knows how I can get the fst package and it functionality build in the ProjectTemplate functionality

I recently discovered the fst Package fst package Would like to use this feature for loading the Cache of ProjectTemplate and Cache each standard file as fst based upon config setting

So in case the same file is in the cache with fst extension and the file is just as old or newer this should be loaded. In case it is older the system should load the normal file and directly save it as .fst

This does the trick

` load_dt <- function(pDATA_TABLE, pPATH = PATH_DATA) {

    file_time_format <- "%Y-%m-%d %H:%M:%S"

    # if fst version exists load it when it is more recent else load normal .RData file
  if (file.exists(file_fst) &
      strptime(file.mtime(file_fst)  , format = file_time_format) >=
      strptime(file.mtime(file_RData), format = file_time_format)) {
    assign(pDATA_TABLE,
           read.fst(path = file_fst, as.data.table = TRUE),
           envir = .GlobalEnv)
  } else {
    load(file_RData,  envir = .GlobalEnv)
    write.fst(get(pDATA_TABLE), file_fst)
  }

  # return()
}`
connectedblue commented 7 years ago

Hi @Fpadt

Have you synced with the latest version on github rather than CRAN? There are a number of changes to caching which try and handle things in a smarter way (like only re-caching is something has changed for example to save speed).

It doesn't use fst, just the normal RDS format. There could be some portability and speed benefits from migrating to fst.

Fpadt commented 7 years ago

thanks will sync maybe I have an old vesion. build somehting my slef using fst and quite happy

Hugovdberg commented 6 years ago

We should merge this issue with #225, feather doesn't work for all types of data. If we can use fst for all data types with similar performance to feather, then fst should be preferred.

KentonWhite commented 6 years ago

OK -- closing this issue and it is referenced in #225.