getting unusually bad performance for flow in DRB

jsadler2 commented 3 years ago

I have been training the model on the full DRB and have been getting unusually bad performance.

jsadler2 commented 3 years ago

I was getting NSE's of basically 0. When I looked into the training data, I found that there were some bad flow observations (https://github.com/USGS-R/delaware-model-prep/issues/89). When I filtered those out, the performance was better, NSE of 0.38 for Train (1985-2006) and 0.42 for Val (2006-2010).

jsadler2 commented 3 years ago

When I look back at results for the full DRB that I was running in Fall 2020, I was getting NSEs of around 0.7, so that was what I was expecting.

jsadler2 commented 3 years ago

If I train on the Christina River Basin, I get NSE's around 0.7. So that's comforting and interesting that there'd be that difference.

jzwart commented 3 years ago

You're using the same drivers as before?

jsadler2 commented 3 years ago

That's I think probably the first thing to look at. I am not using the same drivers. One of the problems is that in the Fall, I wasn't saving the names of the drivers or their mean/std in my prepared data file. I have been doing that for several months now, but I had not started doing that at that time.

I know that they are different though because there are 19 variables in the Fall x_trn array and for a while now, I've been only using 8: x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

jsadler2 commented 3 years ago

One thing I tried is plugging in the data from Fall and keeping everything else the same. What I found was this:

new data/new code

old data/new code

old data/old code

So it seems like it boils down to differences in data not code

jzwart commented 3 years ago

ah, yeah the other drivers might have had some more flow influence with the intermediate variables?

jsadler2 commented 3 years ago

yeah. maybe. but I feel like i would've noticed by now. I guess I should just try with all those drivers and see if that makes the difference.

jsadler2 commented 3 years ago

good call @jzwart. that was it. Now I'm getting train/val NSEs of .88/.84 😮.

wdwatkins commented 3 years ago

Are hidden layer sizes, etc all the same with the different numbers of inputs?

The initial drop in loss with the new data over the first few epochs seems very unhealthy, like there isn't much signal to fit to. Must be something important in those variables?

jsadler2 commented 3 years ago

Yeah. That's the next question :) . Why did that make that big of a difference? Which of those variables is causing that big of an increase in performance? Is the model just taking advantage of those attributes to somehow make individualized models for each reach?

jsadler2 commented 3 years ago

The pretraining can get super low because there is a direct relationship between "seg_outflow" (what we are predicting) and "seg_width" in PRMS:

jsadler2 commented 3 years ago

So once the model fines that, it basically means the error is zero for pretraining. For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

wdwatkins commented 3 years ago

How hard is it to repeatedly train while holding out the new variables one at time?

jsadler2 commented 3 years ago

For the record: 8 vars used in the NSE=~0.4 model:

x_vars: ["seg_rain", "seg_tave_air", "seginc_swrad", "seg_length", "seginc_potet", "seg_slope", "seg_humid", "seg_elev"]

16 vars used in NSE = ~0.8 model:

x_vars: ['seg_ccov', 'seg_elev', 'seg_length', 'seg_rain', 'seg_slope', 'seg_tave_air', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', 'seginc_potet', 'seginc_sroff', 'seginc_ssflow', 'seginc_swrad']

added:

['seg_ccov', 'seg_tave_gw', 'seg_tave_ss', 'seg_tave_upstream', 'seg_upstream_inflow', 'seg_width', 'seginc_gwflow', seginc_sroff', 'seginc_ssflow']

jsadler2 commented 3 years ago

How hard is it to repeatedly train while holding out the new variables one at time?

Not too hard

jsadler2 commented 3 years ago

For finetuning, I think it's that the model is using the segment-specific attributes like slope to tailor parts of the parameter space to each segment. That's my guess.

It's not "slope". Ha. My guess is wrong. That's in the 8-var version

jsadler2 commented 3 years ago

This would probably be a good use case for a partial dependence plot or looking into LIME

jzwart commented 3 years ago

This might be another case where we want to make the case for using pre-training data during the test phase. But it is a little concerning that the improvement is so great over the met data alone. I agree that it would be good for partial dependence and/or LIME

aappling-usgs commented 3 years ago

make the case for using pre-training data during the test phase

Can you elaborate on that, Jake?

aappling-usgs commented 3 years ago

Seems like seg_width just has to go...

jzwart commented 3 years ago

Can you elaborate on that, Jake?

Can we make an argument that it is appropriate to use seg_width , seg_upstream_inflow, etc... as drivers for training and testing? They are 'free' data but PRMS is calibrated on some flow data from 1982 to ~2018

jordansread commented 3 years ago

Butting in on this because I may have a mistaken understanding of how PRMS calibration works.

Is it calibrated to some flow data because we have calibrated it to the flow data, or is it assumed that the national version (of which this is a cut-out) has used flow data? If the latter, that is where I am confused because I thought PRMS calibrated to intermediate targets but was validated with actual flow data.

aappling-usgs commented 3 years ago

Jordan, Jake put notes about PRMS calibration here, including reference to using observed streamflow from 1417 headwater gages for calibration. I'm not sure what qualified as headwater - Jake, do you know?

But also, calibration of a PB model is intentionally much less flexible than training an ML model. We deliberated about this in our train-test-split discussions, especially because we haven't been able to find good ways to exclude all PRMS calibration years from the ML test sets, and I think the difference between PB calibration and ML training is substantial enough that we can probably justify using PRMS outputs even when the calibration period included data in our test period. It's not ideal, but not nearly as bad as building one ML model on another ML model that was trained on the test period.

aappling-usgs commented 3 years ago

Thanks for elaborating, Jake. I think we could make an argument that it's feasible in the real world to pass in seg_width, and thus our model development and testing could include seg_width as an input. For intermediate PRMS variables other than seg_width, I agree that including them as inputs in all phases (pretraining, training, and testing) is a reasonable option and may legitimately improve prediction accuracy.

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

jzwart commented 3 years ago

I'm not sure what qualified as headwater - Jake, do you know?

I don't know, but they cover a pretty big area in the Eastern US - the 'headwater' basins calibrated with flow data are those in red below:

jordansread commented 3 years ago

ahh, I see. So it is actually a calibration target, but follows those intermediate ones. I appreciate the clarification here, as it will keep me from spreading misinformation about what data it has seen and not seen.

aappling-usgs commented 3 years ago

However, I think it's a bad methods idea to pass in seg_width specifically, because then during pretraining the model probably just learns to predict seg_outflow from seg_width without learning to predict the differences between modeled outflow (seg_outflow) and true outflow (observed).

Hmm - but what happens if we use dropout?

jsadler2 commented 3 years ago

I don't think we ever came to a solid understanding why this was happening, but am closing for now.

USGS-R / river-dl