in the VRNN class (in model.py), you cut the gradients behind h_{t-1} using the _repackage_state() function.
I've been thinking about this question for a while now and would have said that the correct thing to do is to not cut the gradients because nothing in the paper indicates that one should.
May I ask what your reasoning is? - I'm not very sure about mine.
Indeed, you are correct. This was my first pytorch project and at the time I thought that this step was necessary for any models with recurrent states. Thanks for that.
Hi,
in the
VRNN
class (in model.py), you cut the gradients behind h_{t-1} using the_repackage_state()
function. I've been thinking about this question for a while now and would have said that the correct thing to do is to not cut the gradients because nothing in the paper indicates that one should.May I ask what your reasoning is? - I'm not very sure about mine.
Thanks! Best, Max