DOI-USGS / national-flow-observations

This repository pulls national flow data from NWIS
Other
4 stars 8 forks source link

Update `tmp` file formats from `RDS` to something faster #21

Closed padilla410 closed 2 years ago

padilla410 commented 2 years ago

This PR closes #16. I chose to convert the temporary files generated by the data pull partitions from RDS (in 10_nwis_pull/tmp) to qs as suggested in the original issue.

I tagged @wdwatkins for the review for two reasons:

David, please let me know if I should tag someone else for review!

I have also included test code to verify that the process runs as expected:

library(scipiper)
options(scipiper.dry_put = T)
source('10_nwis_pull/src/nwis_combine_functions.R')

# start to build the intermediate tmp files if you don't have them locally by rebuilding the '10_nwis_pull/tmp/nwis_dv_data.rds.ind' target
# manually break after a few dv files have been generated in `10_nwis_pull/tmp`
scmake('10_nwis_pull/tmp/nwis_dv_data.rds.ind') 

# create a list of files to combine
test_combine_files <- list.files('10_nwis_pull/tmp', full.names = T) %>% .[grep('*.qs', .)]

# combine and save locally
combine_nwis_data('10_nwis_pull/tmp/nwis_dv_data_tmp.rds.ind', test_combine_files)

# inspect results
check_results <- readRDS('10_nwis_pull/tmp/nwis_dv_data_tmp.rds')
wdwatkins commented 2 years ago

This looks good to me. Are there any target names that should be changed, or are all those temporary files kept internally? https://github.com/USGS-R/national-flow-observations/search?p=1&q=RDS

padilla410 commented 2 years ago

@wdwatkins, no I don't think so. The qs files are internal and stored locally in 10_nwis_pull/tmp. They get munged into an RDS file here so the target names does not change in 10_nwis_pull.yml