DOI-USGS / ds-pipelines-targets-example-wqp

An example targets pipeline for pulling data from the Water Quality Portal (WQP)
Other
10 stars 14 forks source link

Change to how we name `download_grp` #72

Closed lindsayplatt closed 2 years ago

lindsayplatt commented 2 years ago

I propose a slight modification to how we are naming the download_grps in add_download_grp(). As it stands, they are not being named such that the tar_groups() follow the same order as we expect the download groups to be in. See the following code and notice how a task_num of 1000 is being treated as if it came before a task_num of 20

x <- tibble(site = c(1,2,3),
            grid_id = '1014') %>% 
  mutate(task_num = c(1,20,1000),
         download_grp_orig = paste0(grid_id, '_', task_num),
         download_grp_fix = sprintf(paste0("%s_%0", nchar(max(task_num)), "d"), 
                                    grid_id, task_num)) 

arrange(x, download_grp_orig)
# A tibble: 3 x 5
   site grid_id task_num download_grp_orig download_grp_fix
  <dbl> <chr>      <dbl> <chr>             <chr>           
1     1 1014           1 1014_1            1014_0001       
2     3 1014        1000 1014_1000         1014_1000       
3     2 1014          20 1014_20           1014_0020   

arrange(x, download_grp_fix)
# A tibble: 3 x 5
   site grid_id task_num download_grp_orig download_grp_fix
  <dbl> <chr>      <dbl> <chr>             <chr>           
1     1 1014           1 1014_1            1014_0001       
2     2 1014          20 1014_20           1014_0020       
3     3 1014        1000 1014_1000         1014_1000   
lindsayplatt commented 2 years ago

I made this change to our internal pipeline here https://code.usgs.gov/wma/proxies/habs/national-chl-download/-/merge_requests/5/diffs?commit_id=25421db294f78c9081632f25304b7af469092a14 (only visible internally)