HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL

MIT License

275 stars 47 forks source link

This codebase depends on Bansal et al's environments in https://github.com/humancompatibleai/multiagent-competition which use MuJoCo 1.31. We cannot upgrade this since recent MuJoCo versions change the dynamics and we do not have code to retrain the policies. However, Gym and other packages have long since upgraded to more recent versions. So far, we have been working around this by having two virtual environments: one with the old versions, and one with the new versions, and backporting some fixes to try and keep the versions in sync as much as possible.

In branch https://github.com/AdamGleave/mujoco-py/tree/mj131 we introduce a new package mujoco-py-131 with the old bindings, that can be simultaneously installed with a modern version of mujoco-py. In PR https://github.com/HumanCompatibleAI/multiagent-competition/pull/5 the multiagent-competition environments are changed to use a modern version of Gym, with this old version of mujoco-py-131. This PR upgrades adversarial-policies to the latest version of Gym and multiagent-competition. We now have a single requirements.txt and virtual environment in the CI.

There is still room for further improvement: for example, the codebase could be reorganized now the distinction between aprl and modelfree is less sharp.

Fixes #1 (shrinks Docker image by 1.4 GB, although it's still quite large)

Codecov Report

Merging #34 into master will increase coverage by 0.98%. The diff coverage is 84.47%.

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
+ Coverage   60.42%   61.41%   +0.98%     
==========================================
  Files          63       55       -8     
  Lines        5226     4901     -325     
==========================================
- Hits         3158     3010     -148     
+ Misses       2068     1891     -177

Flag	Coverage Δ
#aprl	`?`
#modelfree	`?`

Impacted Files	Coverage Δ
src/aprl/multi/common_worker.py	`100% <ø> (ø)`
src/aprl/training/victim_envs.py	`94.44% <ø> (ø)`
src/aprl/training/scheduling.py	`85.18% <ø> (ø)`
tests/test_common.py	`100% <ø> (ø)`
tests/test_agents.py	`98.97% <ø> (ø)`
src/aprl/policies/base.py	`85.91% <ø> (ø)`
src/aprl/training/gail_dataset.py	`100% <ø> (ø)`
src/aprl/envs/sumo_auto_contact.py	`100% <ø> (ø)`
src/aprl/activations/density/visualize.py	`0% <0%> (ø)`
src/aprl/visualize/scores.py	`0% <0%> (ø)`
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e9e8d16...aa03e9c. Read the comment docs.

HumanCompatibleAI / adversarial-policies

Consolidate into single virtual environment #34

Codecov Report