HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL
MIT License
275 stars 47 forks source link

"Transparent" policies and VecEnv. #7

Closed kantneel closed 5 years ago

kantneel commented 5 years ago

In this context, transparent means exposing parts of the victim's internal computational graph. Namely, we want to expose its observations, feedforward activations and hidden state (if applicable) in a dictionary to its environment. Then that environment should expose this data to all of its wrappers so it can be used wherever desired. This branch implements TransparentFeedForwardPolicy (stable_baselines), TransparentLSTMPolicy (gym_compete) and TransparentCurryVecEnv to accomplish this.

This branch is also spun off of the gail branch. The things related to ray and aws have been reset to the versions on master, but gail infrastructure is still present in modelfree.train. There is also a class and accompanying logic for saving trajectories of agents when they are run through the simulate function in modelfree.utils. This will be important and augmented further in work involving density modeling (in another upcoming branch).

codecov[bot] commented 5 years ago

Codecov Report

Merging #7 into master will increase coverage by 0.02%. The diff coverage is 87.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #7      +/-   ##
==========================================
+ Coverage    58.1%   58.13%   +0.02%     
==========================================
  Files          40       41       +1     
  Lines        3132     3301     +169     
==========================================
+ Hits         1820     1919      +99     
- Misses       1312     1382      +70
Flag Coverage Δ
#aprl 16.11% <0%> (-0.87%) :arrow_down:
#modelfree 47.47% <87.82%> (+0.59%) :arrow_up:
Impacted Files Coverage Δ
src/modelfree/training/shaping_wrappers.py 94.59% <ø> (ø) :arrow_up:
src/aprl/envs/multi_agent.py 78.54% <ø> (ø) :arrow_up:
src/modelfree/common/utils.py 92.78% <100%> (+3.28%) :arrow_up:
src/modelfree/envs/gym_compete.py 96.55% <100%> (+0.59%) :arrow_up:
src/modelfree/common/policy_loader.py 81.91% <78.94%> (-2.8%) :arrow_down:
src/modelfree/common/transparent.py 82.35% <82.35%> (ø)
src/modelfree/train.py 91.66% <90%> (+0.16%) :arrow_up:
src/modelfree/configs/multi/train.py 17.04% <0%> (+1.11%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ef7b147...225db3d. Read the comment docs.

kantneel commented 5 years ago

I need to write some tests for this branch.

AdamGleave commented 5 years ago

The diff here looks really big and seems to include some unrelated changes, e.g. I see some config files that I deleted have been added back in and there's also stuff about GAIL.

I think you probably need to merge master into your branch to pick up some of my more recent changes. If you started this branch from your branch containing GAIL, then we should either open a separate PR for GAIL and merge that in then go onto this one, or you should strip out the GAIL-relevant parts.

Let me know when the diff only contains relevant changes and I'll take another look.

AdamGleave commented 5 years ago

I took another look at this but the diff is still huge. I think this branch has had some (but not all) commits from master and gail merged into it, so there is no single base that makes sense. Holding off until the GAIL PR is merged. Please remind me to look at this again after that.

kantneel commented 5 years ago

The branch is passing tests. @AdamGleave @decodyng could you please review? Thanks

kantneel commented 5 years ago

@AdamGleave @decodyng I've addressed your comments. Please review again when you get a chance.

kantneel commented 5 years ago

@AdamGleave I addressed your comments in the most recent changes. Please review when you get a chance. Thanks!

AdamGleave commented 5 years ago

Much improved. @decodyng do you have any thoughts? Otherwise I think ready to merge.

kantneel commented 5 years ago

Okay I changed the docstring in response to the last comment by @decodyng.