DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

Simplify GCM pipeline patterns & don't branch over empty tiles #243

Closed lindsayplatt closed 2 years ago

lindsayplatt commented 2 years ago

Fixes #237. Take Hayley's GCM grid pipeline and reorganize so that we are branching only when necessary to save on the number of targets files we are creating. This runs really quickly now by not having as many branches. Got past the issue with building empty tiles by splitting the query_cell_centroids_sf target (sf object of only the cells centroids that have lakes) into a list by tile number.

For now, only using two of the GCMs to query and when you run tar_make(), you get this:

> tar_make()
v skip target lake_centroids_sf_rds
v skip target gcm_names
v skip target grid_params
v skip target grid_tiles_sf
v skip target grid_cells_sf
v skip target query_lake_centroids_sf
v skip target grid_cell_centroids_sf
v skip target query_cells_info_df
v skip target lake_cell_xwalk_df
v skip target cell_tile_xwalk_df
v skip target query_cells
v skip target query_cell_centroids_sf
v skip target query_tiles
v skip branch query_cells_centroids_list_by_tile_c47142b2
v skip branch query_cells_centroids_list_by_tile_860fd618
v skip branch query_cells_centroids_list_by_tile_132897b8
v skip pattern query_cells_centroids_list_by_tile
v skip branch gcm_data_raw_feather_10416654
v skip branch gcm_data_raw_feather_1818b7c2
v skip branch gcm_data_raw_feather_6db6876d
v skip branch gcm_data_raw_feather_725ba410
v skip branch gcm_data_raw_feather_d1117e15
v skip branch gcm_data_raw_feather_7f822d61
v skip pattern gcm_data_raw_feather
v skip branch query_map_png_13aa1dc5
v skip branch query_map_png_cb3179bf
v skip branch query_map_png_359750f9
v skip pattern query_map_png
v skip target gcm_files_out
v skip pipeline

I did a test where I changed the number of lakes to sample in subset_lake_centroids() from 5 to 6 and it added a cell to tile 48 (the second in the query_cells_centroids_list_by_tile list). It appropriately skipped re-downloading data for the other two files and only redownloaded data for the tile 48 GCMs!

Tagging @hcorson-dosch but asking @jread-usgs for review since I would like to merge and move onto munging Monday.

lindsayplatt commented 2 years ago

@jread-usgs I believe I addressed all the comments by making changes to the code or by adding #TODO tags or comments to consider.