Closed jds485 closed 1 year ago
Thanks! Yes, accounting for spatial nestedness is on my list. I like the suggestion to define sub-watersheds or regions to hold out. One of the challenges I've been thinking about is how to ensure a proportional amount of discrete and continuous sampling data within each sub-region. Maybe we can just accept that some sub-regions will have different amounts (and that is likely for prediction in other basins, anyway).
This PR adds options to split data by spatial information (reaches in this case). It is applied to the training and testing split, as well as to CV folds in parameter tuning. The main functions for this are in train_models.R:
make_spatial_split
,make_spatial_split_CVtraining
, andassign_spatial_split
. The first and second functions are similar to the temporal split function. The last function is used to ensure consistency in the training and testing sets across all models that will be compared.I set the random and temporal targets to never rebuild because there were function edits that would trigger them to rebuild.
Additional visualization edits:
static_dynamic_spatial
that I'd not seen before. I commented out and am trying to diagnose the issue now.Closes #205