Closed hcorson-dosch-usgs closed 2 years ago
I did build all of the downstream targets again to ensure everything built, but I am just now building out_skipnc_feather
to ensure it works with the change I made from cell
to cell_no
. It's a slow target to build so I didn't want to hold this PR. I'll comment once it's built re: whether or not it built successfully.
Hmm, well. I left it running overnight and it stalled, but it did build 6 of the branches first, so I think it would work. I did try restarting it, but it's hanging on that same branch.... Trying again now with error = 'continue'
to see if that branch actually errors/ if other branches will build.
Also - I was just scrolling back through our chat, Lindsay, and noticed you'd suggested having a single xwalk with a cell_no_spatial
column and cell_no_data
column. In my head I'd interpreted that as two xwalks, so that's what I currently have in the pipeline, but I could combine them into a single crosswalk if you think that would be better.
FYI all branches of the out_skip_nc
target did build
Alright - I have a solution in place for the missing data, which fixes #296
Per Lindsay and my discussion, we landed on an approach that did not change any of the data for cells with missing data, but instead adjusted the lake-cell-tile xwalk so that lakes were matched to only those query cells that returned data, on the basis of which query cell centroid was closest to each lake centroid.
To avoid having to read in the data twice (once in order to determine which cells are missing data, and again to munge the data for cells that are not missing), as I proposed in the 2nd bullet under considerations in #296, Lindsay suggested that I try returning a list of two items from
munge_notaro_to_glm()
:file_out
- The name of the output file of munged data for the given gcm and tilecell_info
- A tibble with a row for each cell in the given tile, with T/F values for whether or not that cell was missing dataThat seemed like a great approach so I set up the
targets
workflow like so:glm_ready_gcm_data_feather
havemunge_notaro_to_glm()
instead build the targetglm_ready_gcm_data_list
, which has a branch for each tile-gcm combo, and 2 elements for each branch:file_out
andcell_info
, as noted above.glm_ready_gcm_data_feather
from thefile_out
element of each branch. _Note: this ended up requiring mapping overglm_ready_gcm_data_list
, as each branch of that upstream target had the two distinct elements, and it was not possible to pull just one element from each branch in one go withglm_ready_gcm_data_list$file_out
_glm_ready_gcm_data_cell_info
from each of thecell_info
elements of the branches ofglm_ready_gcm_data_list
. _Note: again this required mapping over the branches ofglm_ready_gcm_data_list
_.glm_ready_gcm_data_cell_status
, a pivoted version ofglm_ready_gcm_data_cell_info
with a single row per cell, and columns indicating whether or not data was missing for each gcm as well as a final column with the total count of gcms for which data was missing.glm_ready_gcm_data_cell_status
to build the new revised lake-cell-tile xwalklake_cell_tile_xwalk_data_df
(the original is renamed tolake_cell_tile_xwalk_spatial_df
) matching lakes to only those query cells that returned data, based on which query cell centroid is closest to each lake centroid.lake_cell_tile_xwalk_spatial_df
for those lakes that fell within cells that were missing data. I checked and, as expected, the two crosswalks differ for 123 lakes, which was the number of lakes that fell within cells with missing datalake_cell_tile_xwalk_data_df
now gets saved aslake_cell_tile_xwalk_csv
to be used inlake-temperature-process-models
to determine what meteo data to pull for each lake.Mapping with the spatial xwalk - represents the number of lakes in each cell used in the GDP query:
Mapping with the revised xwalk - represents the number of lakes in each cell that will use data from that cell as meteo data for GLM modeling
And the map of missing cells with nearest cells to lakes within missing cells noted in green:
While I was in the code I also cleaned up some of the function documentation and made the use of
cell_no
consistent throughout (some of the munging code usedcell
instead).