Closed AdamGleave closed 4 years ago
Merging #23 into master will increase coverage by
0.23%
. The diff coverage is68.56%
.
@@ Coverage Diff @@
## master #23 +/- ##
==========================================
+ Coverage 61.41% 61.65% +0.23%
==========================================
Files 55 56 +1
Lines 4901 5234 +333
==========================================
+ Hits 3010 3227 +217
- Misses 1891 2007 +116
Impacted Files | Coverage Δ | |
---|---|---|
src/aprl/policies/loader.py | 81.9% <100%> (ø) |
:arrow_up: |
src/aprl/training/logger.py | 100% <100%> (ø) |
:arrow_up: |
src/aprl/training/shaping_wrappers.py | 95.69% <100%> (ø) |
:arrow_up: |
src/aprl/envs/gym_compete.py | 97.16% <100%> (ø) |
:arrow_up: |
src/aprl/training/embedded_agents.py | 95.08% <100%> (ø) |
|
src/aprl/policies/wrappers.py | 97.5% <100%> (+7.5%) |
:arrow_up: |
src/aprl/policies/base.py | 87.65% <100%> (+1.73%) |
:arrow_up: |
src/aprl/training/lookback.py | 77.83% <100%> (ø) |
:arrow_up: |
tests/test_experiments.py | 97.77% <100%> (+0.05%) |
:arrow_up: |
src/aprl/configs/multi/train.py | 16.89% <16.86%> (+1.03%) |
:arrow_up: |
... and 7 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update d6f147c...8f7b76e. Read the comment docs.
Still waiting to see how if the test suite passes, but I believe I've now resolved or responded to almost all comments. Exceptions to that are:
The test failures seem to be legitimate, I get a TensorFlow shape error when I run python -m modelfree.train
. This needs to be fixed before merging, and I'd also like to see some simple unit test for [Multi]CurryVecEnv
.
Today I learned about some of the fun edge cases are involved in getting state management to work here. Some of these include:
initial_state
value of the policies.initial_state
and stable baselines one have the attribute . _initial_state
, which is why I didn't just directly rely on that hereThoughts on adversary paths:
Thoughts on adversary paths:
- I think the fact of paths being auto-generated by highest_win_rate.py as the experiment is being run is part of what makes the current system difficult to follow and reason about. I'd advocate for having a location within the adversarial-policies directory that stores the highest win-rate policies for a given run (e.g. paper/20190429_011349). Since those seem like those won't vary for a given run with fixed/trained adversaries, it'd be nice to just have that be fixed top-level metadata somewhere.
I agree with this criticism. I originally envisaged highest_win_rate.py
as potentially pulling in adversaries from multiple training runs, but that's not the common case. I'd be fine with storing a JSON along with each run.
We could even make it so that rather than specifying a path to the json, we specify a path containing adversary policies. And then if the json exists in that path, use it (cached). Otherwise, invoke highest_win_rate.py
(as a Python function, not as a script).
- The fact of needing to set ADVERSARY_PATHS as an environment variable also seems like something it might be nice to modify. If we had a fixed-within-data location for the adversary paths JSON to be, it seems like it'd be possible to specify as part of the named config which training run (e.g. paper/) to use, and have that generate a path. The environment variable is a horrible hack. It's because named configs get evaluated before CLI arguments for Sacred. So we can't access the CLI argument directly.
Having one named config for each possible ADVERSARY_PATHS would solve this, but I worry it'd lead to an explosion in configs.
Creating PR mainly to simplify code review, but we will want to merge in eventually.