HumanCompatibleAI / overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
https://arxiv.org/abs/1910.05789
MIT License
683 stars 144 forks source link

Can I reproduce the results of NeuIPS 2019 using this code? #143

Closed hogebein closed 3 months ago

hogebein commented 3 months ago

Hi, I'm currently trying to conduct a new research using this envrionment.

However, when I go through these sample codes in "human_aware_rl", I have a feeling that there are some crucial changes are made from the original NeulIPS 2019 version.

For example, all the parameters defined in "src/human_aware_rl/ppo/run_experiments.sh" and the characteristics of training curve (E.g. the figure attached) seems to be different.

Have I missed something or do I just have to use the deprecated old repository to reproduce & test a new proposing method for my research? (As most of the prior works are based on the old repository)

wandb_cramped_room

micahcarroll commented 3 months ago

Hi there. I believe the differences you see are due to the fact that the reward in the figure above may be the dense shaped reward, rather than the sparse reward, which is what is reported in the original paper and in the plots here, which were generated using the src/human_aware_rl/ppo/run_experiments.sh script.

There have been various changes since the NeurIPS 19 version, but the plots at the link above were our attempt to show that the changes do not significantly affect the final results.

As long as you are able to reproduce the results from our new figure in the README (which you should be able to by using the src/human_aware_rl/ppo/run_experiments.sh script), I encourage you to use the newer version of the code, as it will be nicer to work with than the neurips2019 branch.

Let me know if you have any additional questions, or run into any issues.