eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas
MIT License
384 stars 134 forks source link

Consistency entropy sign after breaking Ray update (6.5) #150

Closed internetcoffeephone closed 5 years ago

internetcoffeephone commented 5 years ago

The entropy sign was inconsistent in A3C run scripts. Some were negative, others positive. This raised DepreciationWarnings with Ray versions since 0.6.5. See https://github.com/ray-project/ray/pull/4374

For the A3C/A2C/ppo/IMPALA algorithms, the entropy coefficient sign must be positive, and is internally flipped in Ray when needed.

I also suggest updating to Ray 0.6.6 as it is the latest release, with no other breaking changes.

eugenevinitsky commented 5 years ago

Hi @internetcoffeephone thanks for the PR! It may take a bit of time to merge this in. The positive entropy sign is intended to penalize entropy but it's perfectly plausible that you might want to encourage entropy in the early stages to encourage exploration. Additionally, Ray upgrades require retesting all the baselines to make sure that the baseline hyperparameter settings are still sufficient. I'm going to check at some point whether the paper results can still be reproduced with Ray 0.6.6 at which point I can merge this in. Alternately, if you happen to get the baseline results to reproduce with Ray 0.6.6, please post the reward curves and I will be happy to merge this in!

internetcoffeephone commented 5 years ago

Ah, that's troublesome, my bad. I thought it was a mistake because I believed the top comment in the above PR.

Are the current entropy values correct then? Since e.g. https://github.com/eugenevinitsky/sequential_social_dilemma_games/blob/visible_actions/run_scripts/train_a3c_moa.py contains positive entropy values in the default configuration.

Ray now raises it as an exception, it isn't an actual warning. Do you think it's a good idea to make a PR to change them into actual warnings instead of exceptions?

I'm currently in the process of reproducing the experiments, but don't hold your breath. :)

internetcoffeephone commented 5 years ago

Removing this pull request due to implementation of a config file that enforces consistency across hyperparameters.

We should stick to ray 0.6.4 until the warning raised as an exception is fixed.