Adding State Updating and # of Tasks to LSTM and RGCN models

USGS-R / river-dl

Deep learning model for predicting environmental variables on river systems

Creative Commons Zero v1.0 Universal

21 stars 15 forks source link

Adding State Updating and # of Tasks to LSTM and RGCN models #104

Closed jzwart closed 3 years ago

jzwart commented 3 years ago

I added some functionality to the LSTM and RGCN models so that they can return the h and c states of the LSTM. The LSTM returns the most recent h and c states (i.e. most recent time step), while the RGCN returns the h and c states for every time step since it is looping through each timestep for the LSTM.

I also added options for predicting either one or two tasks for the LSTM and RGCN. Previously, the models only predicted two tasks (i.e. you had to supply observations for two target features), but now both models should be able to predict either one or two tasks while defaulting to predicting only one tasks. This will be a big update and I didn't change train.py or other relevant functions that will call the LSTM/RGCN models because I think we need to first see if these proposed changes will work for all the projects. I also wasn't sure what to call the number of prediction tasks - I ended up calling them tasks since that is what @jsadler2 refers to them in his multi-task modeling paper, but I'm open to other suggestions (e.g. targets, target_features, etc..). I had to update some of the loss functions too since they defaulted to assuming the model was predicting 2 modeling tasks.

Here's an example of running these models with the new updates. This example is just predicting a single task and returning the h and c states. I show that both the LSTM and RGCN h and c states can be adjusted, updated, and influence the model predictions when supplied to the LSTM / RGCN as initial h and c states. This workflow also works for 2 tasks but it isn't shown right now.

closes #98, #106

jzwart commented 3 years ago

Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.

Yes, I think that's a good idea and will update so that they have similar interfaces.

If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.

That'd be great @jsadler2 , I will add a few other commits based on @aappling-usgs and your comments as well.

jzwart commented 3 years ago

OK @jsadler2 , I've addressed most but not all comments. If you want to commit code to this branch addressing your points 1, 2, and 3 above, then go ahead. I won't commit code for a bit.

jsadler2 commented 3 years ago

Okay @jzwart - I added a few commits that get at my 1, 2, and 3 points from above. I'm realizing that there is some more simplifying that I think can/should be done here, but I'm not sure if it belongs in this PR. For example, taking out the sample weights relates to the loss functions (#98), but that may be beyond scope here.

jzwart commented 3 years ago

looks good @jsadler2 , I added a couple comments

jsadler2 commented 3 years ago

@jzwart - I got a little carried away :). I felt like just one thing led to another and I ended doing a couple more major changes that related to what you started.

So now in addition to that you wrote in, i.e.,:

the states attribute in both the RGCNModel and the LSTMModel being exposed
the num_tasks argument

I also

removed the use of weights in the loss functions and took that out of the train script (#98)
unified the multitasking approach so that now the LSTMModel and the GRUModel are much simpler (#106)

These two additional changes were mostly prompted by the num_tasks option that you introduced.

I think all the changes together will make the codebase both simpler and more flexible.

I'm done making changes, now, so if you could look over what I've done, that'd be great.

jzwart commented 3 years ago

I think it looks good @jsadler2 ! I like the unified approach to all the models and similar output for predictions and states. There is a slight difference in how the LSTM and RGCN states are reset and I wonder if we could make them the same. For the RGCN we'd reset by supplying the h and c states:

rgcn_preds = rgcn_model(inputs = inputs, h_init = h, c_init = c)

but the LSTM states would need to be reset before making predictions rather than supplying to the function:

lstm_model.rnn_layer.reset_states(states = [h, c]) 
lstm_preds = lstm_model(inputs = inputs)

It'd be nice to make those the same so we don't have to change the prediction / training code as much depending on the model type we're using. I think we'd just need to add an h_init and c_init argument to the LSTMModel call and reset the states in there (either to zero or the supplied h and c states). I'll add a suggestion where I think it would go.

Other than that, I think the changes look good and it can be merged

jsadler2 commented 3 years ago

Back to you, @jzwart. I'm good with this merge now if you are.

jzwart commented 3 years ago

great, looks good to me. Merging