Closed elisaparga19 closed 5 months ago
Hi @elisaparga19, I do not have an exact answer to your question. I think that the author of the algorithm decided to let a fixed variance (1) for some models such as the reward model, the observation model, and the critic because he does not want those quantities to have a high variance (he has to learn to predict rewards/values and reconstruct images accurately). Indeed, he never samples these quantities but always uses the average. The distribution is only used to calculate the loss. It is different for actions, he wants to learn also the std because he needs it for sampling the actions (a higher std can lead to more exploration in the early stages of training).
Hey!
I was wondering why in some models of Dreamer-V1 algorithm the std of the normal distribution is the output of a neural network (for example for the actor) and sometimes the distribution have just unit standard deviation (for example in the reward model, or the critic)?
Thanks in advance!
Elisa.