Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
303 stars 31 forks source link

std of models in dreamer #235

Closed elisaparga19 closed 5 months ago

elisaparga19 commented 6 months ago

Hey!

I was wondering why in some models of Dreamer-V1 algorithm the std of the normal distribution is the output of a neural network (for example for the actor) and sometimes the distribution have just unit standard deviation (for example in the reward model, or the critic)?

Thanks in advance!

Elisa.

michele-milesi commented 6 months ago

Hi @elisaparga19, I do not have an exact answer to your question. I think that the author of the algorithm decided to let a fixed variance (1) for some models such as the reward model, the observation model, and the critic because he does not want those quantities to have a high variance (he has to learn to predict rewards/values and reconstruct images accurately). Indeed, he never samples these quantities but always uses the average. The distribution is only used to calculate the loss. It is different for actions, he wants to learn also the std because he needs it for sampling the actions (a higher std can lead to more exploration in the early stages of training).