HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL
MIT License
275 stars 47 forks source link

Finetuning victim as defence method #23

Closed AdamGleave closed 4 years ago

AdamGleave commented 5 years ago

Creating PR mainly to simplify code review, but we will want to merge in eventually.

codecov[bot] commented 5 years ago

Codecov Report

Merging #23 into master will increase coverage by 0.23%. The diff coverage is 68.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #23      +/-   ##
==========================================
+ Coverage   61.41%   61.65%   +0.23%     
==========================================
  Files          55       56       +1     
  Lines        4901     5234     +333     
==========================================
+ Hits         3010     3227     +217     
- Misses       1891     2007     +116
Impacted Files Coverage Δ
src/aprl/policies/loader.py 81.9% <100%> (ø) :arrow_up:
src/aprl/training/logger.py 100% <100%> (ø) :arrow_up:
src/aprl/training/shaping_wrappers.py 95.69% <100%> (ø) :arrow_up:
src/aprl/envs/gym_compete.py 97.16% <100%> (ø) :arrow_up:
src/aprl/training/embedded_agents.py 95.08% <100%> (ø)
src/aprl/policies/wrappers.py 97.5% <100%> (+7.5%) :arrow_up:
src/aprl/policies/base.py 87.65% <100%> (+1.73%) :arrow_up:
src/aprl/training/lookback.py 77.83% <100%> (ø) :arrow_up:
tests/test_experiments.py 97.77% <100%> (+0.05%) :arrow_up:
src/aprl/configs/multi/train.py 16.89% <16.86%> (+1.03%) :arrow_up:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d6f147c...8f7b76e. Read the comment docs.

decodyng commented 5 years ago

Still waiting to see how if the test suite passes, but I believe I've now resolved or responded to almost all comments. Exceptions to that are:

AdamGleave commented 5 years ago

The test failures seem to be legitimate, I get a TensorFlow shape error when I run python -m modelfree.train. This needs to be fixed before merging, and I'd also like to see some simple unit test for [Multi]CurryVecEnv.

decodyng commented 5 years ago

Today I learned about some of the fun edge cases are involved in getting state management to work here. Some of these include:

decodyng commented 5 years ago

Thoughts on adversary paths:

AdamGleave commented 5 years ago

Thoughts on adversary paths:

  • I think the fact of paths being auto-generated by highest_win_rate.py as the experiment is being run is part of what makes the current system difficult to follow and reason about. I'd advocate for having a location within the adversarial-policies directory that stores the highest win-rate policies for a given run (e.g. paper/20190429_011349). Since those seem like those won't vary for a given run with fixed/trained adversaries, it'd be nice to just have that be fixed top-level metadata somewhere.

I agree with this criticism. I originally envisaged highest_win_rate.py as potentially pulling in adversaries from multiple training runs, but that's not the common case. I'd be fine with storing a JSON along with each run.

We could even make it so that rather than specifying a path to the json, we specify a path containing adversary policies. And then if the json exists in that path, use it (cached). Otherwise, invoke highest_win_rate.py (as a Python function, not as a script).

  • The fact of needing to set ADVERSARY_PATHS as an environment variable also seems like something it might be nice to modify. If we had a fixed-within-data location for the adversary paths JSON to be, it seems like it'd be possible to specify as part of the named config which training run (e.g. paper/) to use, and have that generate a path. The environment variable is a horrible hack. It's because named configs get evaluated before CLI arguments for Sacred. So we can't access the CLI argument directly.

Having one named config for each possible ADVERSARY_PATHS would solve this, but I worry it'd lead to an explosion in configs.