Closed maurolepore closed 3 years ago
# .rda files in data/ were created with use_data() from data-raw/data.R:
writeLines(readLines("data-raw/data.R"))
#> library(usethis)
#>
#> # Moving the .rda files directly to data/ causes an error about magic numbers
#> # The error does not throw if they are saved via use_data()
#> currencies <- readRDS("data-raw/currencies.rda")
#> use_data(currencies, overwrite = TRUE)
#>
#> index_regions <- readRDS("data-raw/index_regions.rda")
#> use_data(index_regions, overwrite = TRUE)
#>
#> bench_regions <- readRDS("data-raw/bench_regions.rda")
#> use_data(bench_regions, overwrite = TRUE)
fs::dir_ls("data", regexp = "[.]rda")
#> data/bench_regions.rda data/currencies.rda data/index_regions.rda
# This is the recommended approach (https://r-pkgs.org/data.html) and works
# well with load() and devtools::load_all()
load("data/bench_regions.rda")
devtools::load_all()
#> Loading PACTA.analysis
# However, the existing code uses readRDS(), and it fails:
try(readRDS("data/bench_regions.rda"))
#> Error in readRDS("data/bench_regions.rda") : unknown input format
Created on 2020-10-06 by the reprex package (v0.3.0)
The only place where I see code using data/*.rda is here:
That is, in a call to read_in_file()
, which currently is this:
read_in_file <- function(file_name){
# This function checks the file type and reads it in.
# This should allow switching between fst, rda and csv to happen more simply
extension_type = substring(file_name,nchar(file_name)-4+1)
if (!extension_type %in% c(".rda", ".csv", ".fst")){stop("Can't read in file. File extension type not applicable.")}
df <- NA
if(file.exists(file_name)){
if (extension_type == ".rda"){df <- readr::read_rds(file_name)}
if (extension_type == ".fst"){df <- fst::read_fst(file_name)}
if (extension_type == ".csv"){df <- readr::read_csv(file_name)}
}else{
warning(paste0(file_name, " does not exist"))
}
return(df)
}
This seems needlessly complicated: it deals with multiple file types but it is only used to read .rda files:
To avoid bugs, I suggest we either (a) simplify read_in_file()
to implement only reading .rda files or (b) delete read_in_file()
and instead load each file with load()
or a convenient wrapper.
@jacobvjk, any comments about my suggestion?
@maurolepore in principle I don't have a strong opinion on this. If we are sure that using load_all() and load() is a robust way to handle this, then I am happy with that. I am a bit unsure about using this at this point as I don't seem to understand all the implications of using devtools::load_all(), but I am sure we can sort that out. I would lile to discuss however what that means for other data formats, as I understood Clare and CJ found that it can cause issues with that..
I think I figure out what the root of this problem is and made a new issue #124 for it.
I think this issue and #124 are duplicates. I'll close this and follow there.
-- @jacobvjk