DOI-USGS / nawqa_wqp

Scripts/workflow for Water Quality Portal pulls for NAWQA trends and networks analyses.
Other
4 stars 3 forks source link

redo failed gd_puts #30

Closed aappling-usgs closed 5 years ago

aappling-usgs commented 6 years ago

In some cases gd_put got called and appeared to succeed (no errors) but didn't actually get the file onto Drive. I used the following script to repair the damage. (So far we're just working with conductivity, DO, and pH in the data pull.)

# edit the constituent as needed
constituent <- 'pH'
task_file <- sprintf('tasks_1_wqp_%s.yml', constituent)

# compare files on Drive to those promised by the task file
id <- scipiper:::gd_locate_file('1_wqpdata/out/data') %>%
  filter(name=='data') %>% pull(id) %>% as_id()
drive_files <- drive_ls(id) %>% pull(name) %>% grep(constituent, ., value=TRUE)
all_targets <- scipiper::list_all_targets(task_file)
ind_targets <- grep('.*/out/.*\\.ind$', all_targets, value=T)
data_targets <- as_data_file(ind_targets)

# quick check on integrity of Drive - no unexpected files, right?
surprise_files <- setdiff(drive_files, basename(data_targets))
length(surprise_files) # shouldn't be any

# identify needs for repushing
need_files <- setdiff(basename(data_targets), drive_files)
length(need_files) # cond: 27 down to 12. DO: 37 down to 10. pH: 38 down to 20
need_files
need_paths <- as_ind_file(file.path(dirname(data_targets[1]), need_files))

# some files won't be buildable because we couldn't pull the data from WQP
tmp_paths <- gsub('/out/', '/tmp/', need_paths)
wqp_problems <- need_paths[!file.exists(tmp_paths)]
wqp_problems # e.g. conductivity: OK 001-003, TX 001-003 (12 files total)

# repush
pushable_paths <- need_paths[file.exists(tmp_paths)]
length(pushable_paths) # cond: 15. DO: 27. pH: 18.
scmake(pushable_paths, task_file, force=TRUE)