Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Human-expert normalized scores #52

Open ThisIsIsaac opened 5 years ago

ThisIsIsaac commented 5 years ago

The Rainbow DQN paper uses human-expert normalized scores, so I am not sure how to evaluate the training results against the original paper. Do you know what values were used for human expert scores?

I found snippets of the values used from papers here and there, but not sure if we can use the same number and how we can compute a single normalized value for all Atari games: image

Kaixhin commented 5 years ago

Looks like I came up with a script in my Atari repo that can do this, but I can't remember where I got the details (must have scoured through lots of DQN papers). I'm not going to do it myself, but if you want to submit a PR that adds the computation and plotting of this score to test.py then I'd be happy to accept it.

ThisIsIsaac commented 5 years ago

The scores for some games are different from the ones from the DQN paper:

beam_rider:

Enduro

Qbert

Pong

Space invaders:

Do you remember which papers you got these numbers from?

Kaixhin commented 5 years ago

Unfortunately not. Maybe you can email one of the authors of Rainbow to see if they can give you a list of the human rewards and also confirm the score calculation?

Kaixhin commented 5 years ago

Although they apparently got the human rewards from the original paper, you can check this paper for human rewards and evaluation.