Closed niklasdbs closed 3 years ago
Yes, this is not obvious but the way the Bernoulli distribution is implemented is that log prob with a soft target computes the cross entropy to the target. So the discount head is trained towards 0 and 0.999 targets as intended.
Hi, there seems to be an issue with the discount predictor log likelihood targets.
https://github.com/danijar/dreamerv2/blob/e783832f01b2c845c195587158c4e129edabaebb/dreamerv2/agent.py#L168
https://github.com/danijar/dreamerv2/blob/e783832f01b2c845c195587158c4e129edabaebb/dreamerv2/agent.py#L126
If I understand this correctly, this tries to compute the log probability of a Bernoulli distribution with values other than 0 or 1, as the discount will be < 1 for non terminal steps.