read_file_in() fails to read .rda files in data/

maurolepore commented 3 years ago

Since we are loading some of the data files used in the webtool via devtools::load_all(), it seems like that data can ONLY be loaded via the load() function and will produce errors when using the readRDS() or read_rds() functions. Which is odd, given the files are .rda files. We should figure out a robust solution to this, taking into account the situation when we need to use new data (see script 1).

We should also take a quick look at the issue CJ described with loading a recently saved file from memory in script 3, which seems to only work every other time.

-- @jacobvjk

maurolepore commented 3 years ago

# .rda files in data/ were created with use_data() from data-raw/data.R:
writeLines(readLines("data-raw/data.R"))
#> library(usethis)
#> 
#> # Moving the .rda files directly to data/ causes an error about magic numbers
#> # The error does not throw if they are saved via use_data()
#> currencies <- readRDS("data-raw/currencies.rda")
#> use_data(currencies, overwrite = TRUE)
#> 
#> index_regions <- readRDS("data-raw/index_regions.rda")
#> use_data(index_regions, overwrite = TRUE)
#> 
#> bench_regions <- readRDS("data-raw/bench_regions.rda")
#> use_data(bench_regions, overwrite = TRUE)

fs::dir_ls("data", regexp = "[.]rda")
#> data/bench_regions.rda data/currencies.rda    data/index_regions.rda

# This is the recommended approach (https://r-pkgs.org/data.html) and works
# well with load() and devtools::load_all()
load("data/bench_regions.rda")

devtools::load_all()
#> Loading PACTA.analysis

# However, the existing code uses readRDS(), and it fails:
try(readRDS("data/bench_regions.rda"))
#> Error in readRDS("data/bench_regions.rda") : unknown input format

^{Created on 2020-10-06 by the reprex package (v0.3.0)}

maurolepore commented 3 years ago

The only place where I see code using data/*.rda is here:

That is, in a call to read_in_file(), which currently is this:

read_in_file <- function(file_name){
  # This function checks the file type and reads it in.
  # This should allow switching between fst, rda and csv to happen more simply

  extension_type = substring(file_name,nchar(file_name)-4+1)
  if (!extension_type %in% c(".rda", ".csv", ".fst")){stop("Can't read in file. File extension type not applicable.")}

  df <- NA

  if(file.exists(file_name)){

    if (extension_type == ".rda"){df <- readr::read_rds(file_name)}
    if (extension_type == ".fst"){df <- fst::read_fst(file_name)}
    if (extension_type == ".csv"){df <- readr::read_csv(file_name)}
  }else{
    warning(paste0(file_name, " does not exist"))
  }

  return(df)
}

This seems needlessly complicated: it deals with multiple file types but it is only used to read .rda files:

To avoid bugs, I suggest we either (a) simplify read_in_file() to implement only reading .rda files or (b) delete read_in_file() and instead load each file with load() or a convenient wrapper.

@jacobvjk, any comments about my suggestion?

jacobvjk commented 3 years ago

@maurolepore in principle I don't have a strong opinion on this. If we are sure that using load_all() and load() is a robust way to handle this, then I am happy with that. I am a bit unsure about using this at this point as I don't seem to understand all the implications of using devtools::load_all(), but I am sure we can sort that out. I would lile to discuss however what that means for other data formats, as I understood Clare and CJ found that it can cause issues with that..

cjyetman commented 3 years ago

I think I figure out what the root of this problem is and made a new issue #124 for it.

maurolepore commented 3 years ago

I think this issue and #124 are duplicates. I'll close this and follow there.

RMI-PACTA / PACTA_analysis

read_file_in() fails to read .rda files in data/ #119