danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
886 stars 195 forks source link

Discount predictor invalid log_prob targets? #18

Closed niklasdbs closed 3 years ago

niklasdbs commented 3 years ago

Hi, there seems to be an issue with the discount predictor log likelihood targets.

https://github.com/danijar/dreamerv2/blob/e783832f01b2c845c195587158c4e129edabaebb/dreamerv2/agent.py#L168

https://github.com/danijar/dreamerv2/blob/e783832f01b2c845c195587158c4e129edabaebb/dreamerv2/agent.py#L126

If I understand this correctly, this tries to compute the log probability of a Bernoulli distribution with values other than 0 or 1, as the discount will be < 1 for non terminal steps.

danijar commented 3 years ago

Yes, this is not obvious but the way the Bernoulli distribution is implemented is that log prob with a soft target computes the cross entropy to the target. So the discount head is trained towards 0 and 0.999 targets as intended.