google-deepmind / hanabi-learning-environment

hanabi_learning_environment is a research platform for Hanabi experiments.
Apache License 2.0
644 stars 146 forks source link

observation stacking? #23

Closed mwalton closed 5 years ago

mwalton commented 5 years ago

we're working to reproduce some of the results in the original paper. It is stated that the rainbow agent: "is feedforward and does not use any observation stacking outside of the last action, which is included in the current observation".

However, in the code the rainbow agent appears to stack the last 4 observations by default. Empirically (at least in early iterations) this doesn't seem to affect cumulative return much either way. Could someone clarify if obs stacking was used for the results in the paper?

findmyway commented 5 years ago

I think the following indicates that only the latest observation is used.

https://github.com/deepmind/hanabi-learning-environment/blob/253d6fff48dac3d2118cefc308fee156a7de9445/agents/rainbow/configs/hanabi_rainbow.gin#L39

nolanbard commented 5 years ago

Confirming that the results presented in the paper did not use any observation stacking.