Closed AdamGleave closed 5 years ago
Merging #6 into master will decrease coverage by
0.13%
. The diff coverage is76.36%
.
@@ Coverage Diff @@
## master #6 +/- ##
==========================================
- Coverage 73.99% 73.86% -0.14%
==========================================
Files 29 29
Lines 2023 2020 -3
==========================================
- Hits 1497 1492 -5
- Misses 526 528 +2
Flag | Coverage Δ | |
---|---|---|
#aprl | 26.23% <0%> (+0.03%) |
:arrow_up: |
#modelfree | 56.43% <76.36%> (-0.17%) |
:arrow_down: |
Impacted Files | Coverage Δ | |
---|---|---|
src/modelfree/train.py | 90.36% <47.61%> (-1.66%) |
:arrow_down: |
src/modelfree/gym_compete_conversion.py | 96.77% <94.11%> (+1.72%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update daa8bb0...b2acbb7. Read the comment docs.
Quick test running experiments/modelfree/score-old-vs-new-zoo.sh
in ae33117dccf5a76701ab926cc2331f65b042a65c shows no significant differences in overall win-rate (testing new vs new, new vs old, old vs new, old vs old). There is some difference though which I think is related to the random sampling in TensorFlow depending on operation seeds. (I tried to pin this down more precisely but TensorFlow does not make reproducibility easy.)
*** KickAndDefend-v0 ***
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo/env_name/multicomp\_KickAndDefend-v0/stderr <==
INFO - score - Result: {'ties': 17, 'wincounts': [669, 314]}
INFO - score - Completed after 0:12:14
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo_old/env_name/multicomp\_KickAndDefend-v0/stderr <==
INFO - score - Result: {'ties': 16, 'wincounts': [667, 317]}
INFO - score - Completed after 0:12:24
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo/env_name/multicomp\_KickAndDefend-v0/stderr <==
INFO - score - Result: {'ties': 19, 'wincounts': [686, 295]}
INFO - score - Completed after 0:12:09
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo_old/env_name/multicomp\_KickAndDefend-v0/stderr <==
INFO - score - Result: {'ties': 14, 'wincounts': [667, 319]}
INFO - score - Completed after 0:12:10
*** SumoHumans-v0 ***
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo/env_name/multicomp\_SumoHumans-v0/stderr <==
INFO - score - Result: {'ties': 10, 'wincounts': [828, 162]}
INFO - score - Completed after 0:09:04
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo_old/env_name/multicomp\_SumoHumans-v0/stderr <==
INFO - score - Result: {'ties': 15, 'wincounts': [830, 155]}
INFO - score - Completed after 0:08:58
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo/env_name/multicomp\_SumoHumans-v0/stderr <==
INFO - score - Result: {'ties': 18, 'wincounts': [825, 157]}
INFO - score - Completed after 0:09:03
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo_old/env_name/multicomp\_SumoHumans-v0/stderr <==
INFO - score - Result: {'ties': 20, 'wincounts': [809, 171]}
INFO - score - Completed after 0:09:07
*** RunToGoalHumans-v0 ***
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo/env_name/multicomp\_RunToGoalHumans-v0/stderr <==
INFO - score - Result: {'ties': 269, 'wincounts': [277, 454]}
INFO - score - Completed after 0:04:11
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo_old/env_name/multicomp\_RunToGoalHumans-v0/stderr <==
INFO - score - Result: {'ties': 274, 'wincounts': [274, 452]}
INFO - score - Completed after 0:04:08
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo/env_name/multicomp\_RunToGoalHumans-v0/stderr <==
INFO - score - Result: {'ties': 248, 'wincounts': [302, 450]}
INFO - score - Completed after 0:04:06
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo_old/env_name/multicomp\_RunToGoalHumans-v0/stderr <==
INFO - score - Result: {'ties': 270, 'wincounts': [280, 450]}
INFO - score - Completed after 0:04:03
*** YouShallNotPassHumans-v0 ***
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo/env_name/multicomp\_YouShallNotPassHumans-v0/stderr <==
INFO - score - Result: {'ties': 0, 'wincounts': [497, 503]}
INFO - score - Completed after 0:04:25
==> data/score-old-vs-new/parallel/a_type/zoo/b_type/zoo_old/env_name/multicomp\_YouShallNotPassHumans-v0/stderr <==
INFO - score - Result: {'ties': 0, 'wincounts': [511, 489]}
INFO - score - Completed after 0:04:21
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo/env_name/multicomp\_YouShallNotPassHumans-v0/stderr <==
INFO - score - Result: {'ties': 0, 'wincounts': [476, 524]}
INFO - score - Completed after 0:04:20
==> data/score-old-vs-new/parallel/a_type/zoo_old/b_type/zoo_old/env_name/multicomp\_YouShallNotPassHumans-v0/stderr <==
INFO - score - Result: {'ties': 0, 'wincounts': [474, 526]}
INFO - score - Completed after 0:04:16
Bansal et al released policy weights and architecture in
gym_compete
. We already adapted the interface to be able to load an agent and replay it (used inscore_agent
and to embed victim intrain
), but it did not previously have support to continue training the loaded agent.This PR adds support for this. (Note most changes take place in our fork of
gym_compete
, this is just glue code.)