DOI-USGS / national-flow-observations

This repository pulls national flow data from NWIS
Other
4 stars 8 forks source link

change to 10_nwis_pull so partitions don't cause a rebuild #10

Closed lindsayplatt closed 4 years ago

lindsayplatt commented 4 years ago

The way that we had set up nwis_dv_partition and nwis_uv_partition is that the partitions are named using the current day's date. Within partition_inventory, we used Sys.time() to figure out the partition ids. The major problem with this is that the partitions are objects and not files in the yml, so every person will need to build them when they run the pipeline. Even if there are no other changes but the person is building the pipeline on a different day than the original person, the partition ids will now be different and force a rebuild of the WHOLE PIPELINE. This is obviously not an ideal situation. So, the following was one way to avoid rebuilds now and the future unless the inventory is forced to rebuild or the new pull_id_dv or pull_id_uv targets are changed.

Why pull_id_dv and pull_id_uv you ask? Well, the first time this pipeline was run all the way through, the dv and uv partition targets were built on different days and thus had different partition ids due to the presence of Sys.time() (explained above). So, we implemented this solution with short-term convenience in mind - prevent others from rebuilding this big pull right now due to partition ID naming differences. In the future (before the next big pull), we can consider simplifying this.

After making this change, I tested that the pipeline would rebuild the partitions but then bypass all the others which were recently built.

library(scipiper)
scmake("nwis_dv_pull_plan")

scmake("nwis_dv_pull_plan")
Starting build at 2020-06-12 14:59:38
<  MAKE > nwis_dv_pull_plan
[ BUILD ] pull_id                                                 |  pull_id <- c("200602")
[  READ ]                                                         |  # loading packages
[    OK ] nwis_pull_size
[    OK ] nwis_pull_parameters
[    OK ] 10_nwis_pull/inout/nwis_dv_inventory.rds.ind
[    OK ] 10_nwis_pull/inout/nwis_dv_inventory.rds
[    OK ] nwis_dv_inventory
[ BUILD ] nwis_dv_partition                                       |  nwis_dv_partition <- partition_inventory(inventory = nwis_dv_in...
[    OK ] nwis_dv_pull_plan
Finished build at 2020-06-12 14:59:50