danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
886 stars 195 forks source link

Questions about atari evaluation protocol #17

Closed jmkim0309 closed 3 years ago

jmkim0309 commented 3 years ago

Hi @danijar, thank you for this great work.

I have some questions about evaluation protocol used in this code and dreamerV2 paper.

danijar commented 3 years ago

Hi, thanks for your question. What we mean by standard evaluation protocol of Machdo et al. (2018) is that we use sticky actions (with 25% probability, the agent action is ignored and the previous action is repeated instead), the full action space (rather than different action spaces of only the useful actions of each game), the life-loss heuristic is not used.

We are using separate evaluation episodes where the mode of the policy is used instead of a sample but the difference to the training episode scores is small. We are running 1 such evaluation episode every 1e5 training steps as you pointed out. The plots in the paper are binned with bin size 1e6, which means the scores are averages over 10 evaluation episodes.