danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
886 stars 195 forks source link

Understanding re-clipping in Truncated Normal distribution #50

Closed pvskand closed 1 year ago

pvskand commented 1 year ago

Hi,

I was looking at the TruncNormalDist code and was wondering why were the samples re-clipped ('re' because they are already in [-1, 1] because of tfd.TruncatedNormal's sampling). In practice it seems to me that this wouldn't create an issue as it is only re-clipped by 1e-6, but I am curious if I'm missing something.

Thanks!

danijar commented 1 year ago

That's just because the log prob is not differentiable in the bounds, but I think never versions of TFP may give them a gradient already.