unifying multi-task implementation

jsadler2 commented 3 years ago

Currently, the LSTM and GRU implementation of multitask learning is different than the RGCN implementation.

LSTM and GRU -> combine the gradients of the two variables in the train_step function
RGCN -> combines the losses of the two variables. This happens under the "TF hood"

I think it makes sense to either choose one implementation or the other so as to simplify.

I lean toward the RGCN approach and just add the losses together. It is simpler and I think it will be sufficient for our needs. One drawback is that it makes it so you can't do the fancy gradient correction approach I was using before. But that didn't actually improve performance, and if we want to we can always go back and look at the code and resurrect it.

aappling-usgs commented 3 years ago

I'm good with the RGCN approach given the exploration you've already done.

jzwart commented 3 years ago

When adding in the state updating functionality, I thought the RGCN code was easier to follow. If the gradient correction doesn't improve performance, then I'd vote for the RGCN approach

USGS-R / river-dl

unifying multi-task implementation #106