std of models in dreamer

Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric

Apache License 2.0

303 stars 31 forks source link

Hi @elisaparga19, I do not have an exact answer to your question. I think that the author of the algorithm decided to let a fixed variance (1) for some models such as the reward model, the observation model, and the critic because he does not want those quantities to have a high variance (he has to learn to predict rewards/values and reconstruct images accurately). Indeed, he never samples these quantities but always uses the average. The distribution is only used to calculate the loss. It is different for actions, he wants to learn also the std because he needs it for sampling the actions (a higher std can lead to more exploration in the early stages of training).

Eclectic-Sheep / sheeprl

std of models in dreamer #235