Closed jsadler2 closed 3 years ago
I'm good with the RGCN approach given the exploration you've already done.
When adding in the state updating functionality, I thought the RGCN code was easier to follow. If the gradient correction doesn't improve performance, then I'd vote for the RGCN approach
Currently, the LSTM and GRU implementation of multitask learning is different than the RGCN implementation.
train_step
functionI think it makes sense to either choose one implementation or the other so as to simplify.
I lean toward the RGCN approach and just add the losses together. It is simpler and I think it will be sufficient for our needs. One drawback is that it makes it so you can't do the fancy gradient correction approach I was using before. But that didn't actually improve performance, and if we want to we can always go back and look at the code and resurrect it.