truncated training data?

USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems

Creative Commons Zero v1.0 Universal

21 stars 15 forks source link

truncated training data? #122

Open janetrbarclay opened 3 years ago

janetrbarclay commented 3 years ago

Unless I'm missing something, this line truncates the observations to the time frame of the PRMS data, which means we're only training on WY 1986 - 2016 (ignoring WY 2017 - 2020, even in the fine tuning). I guess this is unavoidable since we need the PRMS data for the inputs even in the fine-tuning? Maybe we just need a comment in the config.yml that the train / test / val dates are truncated to those in the sntemp file?

https://github.com/USGS-R/river-dl/blob/0c78af242ca03010080bc7e43ca97d17cef6eda8/river_dl/preproc_utils.py#L125

jsadler2 commented 3 years ago

this is a really interesting idea, @janetrbarclay. really, we could have two separate training periods, huh? i'd never thought of that. we could have pretrain_start_date pretrain_end_date finetune_start_date finetune_end_date instead of just train_start_date train_end_date

I think this would be a good PR. If only pretrain_{start,end}_date is defined, those would be used for the finetuning and vice versa.

janetrbarclay commented 3 years ago

I agree that could be interesting. Is it something that someone would use right now? If so, I'd be happy to do a PR for it. I modified the gw_utils.py script to use observed temps from the training period outside the PRMS time frame to calculate the annual properties (assuming those properties are representative of the full training period), but without the PRMS outputs I can't train on that time period.

aappling-usgs commented 3 years ago

I don't know of immediate needs for this for streams, but I wouldn't be surprised if they come up. Hayley is working on lake projections where she's pretraining on contemporary and future periods of GCM simulations and GLM predictions, then finetuning on contemporary periods only. We might have a similar need when we get to making stream projections, too.

jzwart commented 3 years ago

For the temperature forecasting project, we had to do two different training periods for pre-training and fine-tuning since pre-training dataset didn't extend as far as the fine-tune dataset (see here). So I think this flexibility would be useful.