Test data is split by lake, but validation data is not. This can lead to overfitting, since the validation data used to determine early stopping criteria is taken from the same lakes as the training data. This split should happen during 2_process at the same time that the test set is formed, so we can remove the training/validation splitting that happens in 3_train.
Test data is split by lake, but validation data is not. This can lead to overfitting, since the validation data used to determine early stopping criteria is taken from the same lakes as the training data. This split should happen during
2_process
at the same time that the test set is formed, so we can remove the training/validation splitting that happens in3_train
.