Closed Jogima-cyber closed 1 year ago
Hi, I was wondering if there was a reason for using different formulas for the definition of the GRU? If I'm not mistaken, the standard formulas:
and the ones used in this implementation: https://github.com/danijar/dreamerv3/blob/423291a9875bb9af43b6db7150aaa972ba889266/dreamerv3/nets.py#L131-L140 are different.
Hi, I think the version I'm using was designed by Nvidia for cuDNN. The motivation is that you compute all relevant quantities in a single matmul.
Thanks for the answer!
Hi, I was wondering if there was a reason for using different formulas for the definition of the GRU? If I'm not mistaken, the standard formulas:
and the ones used in this implementation: https://github.com/danijar/dreamerv3/blob/423291a9875bb9af43b6db7150aaa972ba889266/dreamerv3/nets.py#L131-L140 are different.