The way that we had set up nwis_dv_partition and nwis_uv_partition is that the partitions are named using the current day's date. Within partition_inventory, we used Sys.time() to figure out the partition ids. The major problem with this is that the partitions are objects and not files in the yml, so every person will need to build them when they run the pipeline. Even if there are no other changes but the person is building the pipeline on a different day than the original person, the partition ids will now be different and force a rebuild of the WHOLE PIPELINE. This is obviously not an ideal situation. So, the following was one way to avoid rebuilds now and the future unless the inventory is forced to rebuild or the new pull_id_dv or pull_id_uv targets are changed.
Why pull_id_dv and pull_id_uv you ask? Well, the first time this pipeline was run all the way through, the dv and uv partition targets were built on different days and thus had different partition ids due to the presence of Sys.time() (explained above). So, we implemented this solution with short-term convenience in mind - prevent others from rebuilding this big pull right now due to partition ID naming differences. In the future (before the next big pull), we can consider simplifying this.
After making this change, I tested that the pipeline would rebuild the partitions but then bypass all the others which were recently built.
library(scipiper)
scmake("nwis_dv_pull_plan")
scmake("nwis_dv_pull_plan")
Starting build at 2020-06-12 14:59:38
< MAKE > nwis_dv_pull_plan
[ BUILD ] pull_id | pull_id <- c("200602")
[ READ ] | # loading packages
[ OK ] nwis_pull_size
[ OK ] nwis_pull_parameters
[ OK ] 10_nwis_pull/inout/nwis_dv_inventory.rds.ind
[ OK ] 10_nwis_pull/inout/nwis_dv_inventory.rds
[ OK ] nwis_dv_inventory
[ BUILD ] nwis_dv_partition | nwis_dv_partition <- partition_inventory(inventory = nwis_dv_in...
[ OK ] nwis_dv_pull_plan
Finished build at 2020-06-12 14:59:50
The way that we had set up
nwis_dv_partition
andnwis_uv_partition
is that the partitions are named using the current day's date. Withinpartition_inventory
, we usedSys.time()
to figure out the partition ids. The major problem with this is that the partitions are objects and not files in the yml, so every person will need to build them when they run the pipeline. Even if there are no other changes but the person is building the pipeline on a different day than the original person, the partition ids will now be different and force a rebuild of the WHOLE PIPELINE. This is obviously not an ideal situation. So, the following was one way to avoid rebuilds now and the future unless the inventory is forced to rebuild or the newpull_id_dv
orpull_id_uv
targets are changed.Why
pull_id_dv
andpull_id_uv
you ask? Well, the first time this pipeline was run all the way through, the dv and uv partition targets were built on different days and thus had different partition ids due to the presence ofSys.time()
(explained above). So, we implemented this solution with short-term convenience in mind - prevent others from rebuilding this big pull right now due to partition ID naming differences. In the future (before the next big pull), we can consider simplifying this.After making this change, I tested that the pipeline would rebuild the partitions but then bypass all the others which were recently built.