HumanCompatibleAI / adversarial-policies

Find best-response to a fixed policy in multi-agent RL
MIT License
275 stars 47 forks source link

Merging work on reward shaping and annealing #2

Closed kantneel closed 5 years ago

kantneel commented 5 years ago
codecov[bot] commented 5 years ago

Codecov Report

Merging #2 into master will increase coverage by 3.28%. The diff coverage is 91.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #2      +/-   ##
==========================================
+ Coverage   66.66%   69.95%   +3.28%     
==========================================
  Files          20       25       +5     
  Lines        1335     1541     +206     
==========================================
+ Hits          890     1078     +188     
- Misses        445      463      +18
Flag Coverage Δ
#aprl 33.61% <0%> (-5.19%) :arrow_down:
#modelfree 47.11% <91.58%> (+6.81%) :arrow_up:
Impacted Files Coverage Δ
src/modelfree/gym_compete_conversion.py 90.75% <100%> (+0.15%) :arrow_up:
src/modelfree/__init__.py 100% <100%> (ø)
src/modelfree/envs/sumo_auto_contact.py 100% <100%> (ø)
src/modelfree/envs/__init__.py 100% <100%> (ø)
src/modelfree/score_agent.py 98.24% <50%> (-1.76%) :arrow_down:
src/modelfree/ppo_baseline.py 92.85% <76%> (-5.9%) :arrow_down:
src/modelfree/shaping_wrappers.py 93.07% <93.07%> (ø)
src/modelfree/scheduling.py 95.55% <95.55%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 72b9b9c...47f4a2e. Read the comment docs.

AdamGleave commented 5 years ago

I've made a lot of changes to master, which should simplify implementation of some of these things (particularly adding noise to the victim), so now would be a good time to merge master again. Ping me once you've done that & got the unit tests passing and I'll do a more detailed review. In the meantime will go through and leave some high-level comments.