DOI-USGS / lake-temperature-model-prep

Pipeline #1
Other
6 stars 13 forks source link

GCM munging - address NA cells #309

Closed hcorson-dosch-usgs closed 2 years ago

hcorson-dosch-usgs commented 2 years ago

Alright - I have a solution in place for the missing data, which fixes #296

Per Lindsay and my discussion, we landed on an approach that did not change any of the data for cells with missing data, but instead adjusted the lake-cell-tile xwalk so that lakes were matched to only those query cells that returned data, on the basis of which query cell centroid was closest to each lake centroid.

To avoid having to read in the data twice (once in order to determine which cells are missing data, and again to munge the data for cells that are not missing), as I proposed in the 2nd bullet under considerations in #296, Lindsay suggested that I try returning a list of two items from munge_notaro_to_glm():

That seemed like a great approach so I set up the targets workflow like so:

Mapping with the spatial xwalk - represents the number of lakes in each cell used in the GDP query: query_tile_cell_map

Mapping with the revised xwalk - represents the number of lakes in each cell that will use data from that cell as meteo data for GLM modeling tile_cell_map

And the map of missing cells with nearest cells to lakes within missing cells noted in green: image

While I was in the code I also cleaned up some of the function documentation and made the use of cell_no consistent throughout (some of the munging code used cell instead).

hcorson-dosch-usgs commented 2 years ago

I did build all of the downstream targets again to ensure everything built, but I am just now building out_skipnc_feather to ensure it works with the change I made from cell to cell_no. It's a slow target to build so I didn't want to hold this PR. I'll comment once it's built re: whether or not it built successfully.

hcorson-dosch-usgs commented 2 years ago

Hmm, well. I left it running overnight and it stalled, but it did build 6 of the branches first, so I think it would work. I did try restarting it, but it's hanging on that same branch.... Trying again now with error = 'continue' to see if that branch actually errors/ if other branches will build.

hcorson-dosch-usgs commented 2 years ago

Also - I was just scrolling back through our chat, Lindsay, and noticed you'd suggested having a single xwalk with a cell_no_spatial column and cell_no_data column. In my head I'd interpreted that as two xwalks, so that's what I currently have in the pipeline, but I could combine them into a single crosswalk if you think that would be better.

hcorson-dosch-usgs commented 2 years ago

FYI all branches of the out_skip_nc target did build