USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems
Creative Commons Zero v1.0 Universal
21 stars 14 forks source link

scale tst partition y data? #193

Closed janetrbarclay closed 2 years ago

janetrbarclay commented 2 years ago

I just noticed that we scale the y_obs for the training and validation partition but not the test partition.

https://github.com/USGS-R/river-dl/blob/d940d35fd6415fc55f8f5844be28be289eb12bf7/river_dl/preproc_utils.py#L686-L697

We do scale the x data for all three partitions.

https://github.com/USGS-R/river-dl/blob/d940d35fd6415fc55f8f5844be28be289eb12bf7/river_dl/preproc_utils.py#L872-L884

Any reason we shouldn't scale the y_obs for the test partition? It wouldn't make any difference in the training but I'm using the batched data for some pre-processing and running into some weird values because some of it is scaled and some isn't.

jsadler2 commented 2 years ago

I think it's a good idea to scale the y_tst.

Originally when I wrote this, I didn't think we'd need to scale the val or test partitions since we'd never use those in training. Then I think we added the validation scaling when Simon added in the early stopping functionality. I don't see any reason to not scale test too. ... And I think it would be better to just have it consistent across the partitions.

janetrbarclay commented 2 years ago

That makes sense in how it came to be and I agree in having consistency. I'll do the PR now.