Social Curiosity Module implementation and MOA fixes

eugenevinitsky / sequential_social_dilemma_games

Repo for reproduction of sequential social dilemmas

MIT License

380 stars 134 forks source link

Social Curiosity Module implementation and MOA fixes #179

Closed internetcoffeephone closed 3 years ago

internetcoffeephone commented 4 years ago

@eugenevinitsky As discussed in our email conversation.

Highlights:

Optimizations/bugfixes in map_env
Move the MOA predictions to a single tensor evaluation rather than handling them in postprocessing
On the whole, experiments run a lot faster now
Upgrade to ray 0.8.4, tensorflow-gpu 2.1.0, other misc library upgrades
Fix tests
Add isort, pre-commit and black config files.
Add a simple environment where a single agent has to pull a set of switches, then walk through a door. This can be used for testing purposes.
The addition of my own model, the social curiosity module (often denoted in the code as scm): a combination of the social influence reward with Burda's intrinsic curiosity module.

ezelikman commented 4 years ago

Hi, this is really great work, and thank you for implementing all of these changes. However, currently, the patches that are made to raylib are only applied to the dynamic and not the eager version of the library. As a result, using running the train scripts with --eager fails (due to the type mismatch corrected in the patch) whereas running it normally doesn't. Not being able to use --eager definitely complicates debugging models built on the repo.

internetcoffeephone commented 4 years ago

Hi, this is really great work, and thank you for implementing all of these changes. However, currently, the patches that are made to raylib are only applied to the dynamic and not the eager version of the library. As a result, using running the train scripts with --eager fails (due to the type mismatch corrected in the patch) whereas running it normally doesn't. Not being able to use --eager definitely complicates debugging models built on the repo.

Thank you!

I agree, and aside from the defect you stated the patch is brittle and breaks on ray updates as well, it was a quick hack to avoid having to fork ray. Personally I never got eager mode to run, even before I created that patch, but that was some ray versions ago. If you want to update the patch, feel free to make a PR in my fork.

internetcoffeephone commented 4 years ago

@ezelikman I looked into why I didn't get eager to run - it wasn't because of the patch, but because the implementation of eager_tf_policy and dynamic_tf_policy in ray differed. They were maintained separately, so bugfixes to one would not affect the other.

Thus, you wouldn't just need the dtype patch, but also to fix those classes so they would be in line with each other again.

eugenevinitsky commented 4 years ago

Hey, I'm down to merge this since it mostly doesn't touch the envs and all the tests pass. Let me know when you feel it's ready?

eugenevinitsky commented 4 years ago

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

internetcoffeephone commented 3 years ago

Wow, you've done a crazy amount of work. Going to tweet this out whenever it's merged so let me know if you have a handle I can tag.

I don't have a twitter, if you could link to my personal website that would be great! Also, feel free to mention that I'm job-hunting for ML Engineer positions.