HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL
MIT License
275 stars 47 forks source link

Flexible configuration in multi.score #37

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

Change config of aprl.multi.score to based on modifiers specifying the victim and opponent policies, then take the Cartesian product of this pair.

This flexibility is particularly helpful now we need to compare finetuned victims to adversaries, etc.

codecov[bot] commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@62d53db). Click here to learn what that means. The diff coverage is 52.54%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #37   +/-   ##
=========================================
  Coverage          ?   61.88%           
=========================================
  Files             ?       56           
  Lines             ?     5242           
  Branches          ?        0           
=========================================
  Hits              ?     3244           
  Misses            ?     1998           
  Partials          ?        0
Impacted Files Coverage Δ
src/aprl/multi/score.py 86.23% <ø> (ø)
src/aprl/configs/multi/common.py 61.53% <100%> (ø)
src/aprl/policies/loader.py 82.07% <100%> (ø)
src/aprl/envs/gym_compete.py 97.22% <100%> (ø)
src/aprl/configs/multi/train.py 16.66% <100%> (ø)
src/aprl/configs/multi/score.py 50.59% <48.07%> (ø)
src/aprl/envs/__init__.py 88.88% <77.77%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 62d53db...8b36dd7. Read the comment docs.

AdamGleave commented 4 years ago

Thanks for the review, made the requested changes.