kantneel commented 5 years ago

Added reward_shaping.py which has class for a RewardShapingEnv, a Scheduler, and a few Annealers.
Modified ppo_baseline to support experiments where reward shaping is annealed over a fraction of training. Also data is logged in tensorboard format in addition to stdout.
Added default options to ppo_baseline and score_agent for human sumo experiments

codecov[bot] commented 5 years ago

Codecov Report

Merging #2 into master will increase coverage by 3.28%. The diff coverage is 91.58%.

@@            Coverage Diff             @@
##           master       #2      +/-   ##
==========================================
+ Coverage   66.66%   69.95%   +3.28%     
==========================================
  Files          20       25       +5     
  Lines        1335     1541     +206     
==========================================
+ Hits          890     1078     +188     
- Misses        445      463      +18

Flag	Coverage Δ
#aprl	`33.61% <0%> (-5.19%)`	:arrow_down:
#modelfree	`47.11% <91.58%> (+6.81%)`	:arrow_up:

Impacted Files	Coverage Δ
src/modelfree/gym_compete_conversion.py	`90.75% <100%> (+0.15%)`	:arrow_up:
src/modelfree/__init__.py	`100% <100%> (ø)`
src/modelfree/envs/sumo_auto_contact.py	`100% <100%> (ø)`
src/modelfree/envs/__init__.py	`100% <100%> (ø)`
src/modelfree/score_agent.py	`98.24% <50%> (-1.76%)`	:arrow_down:
src/modelfree/ppo_baseline.py	`92.85% <76%> (-5.9%)`	:arrow_down:
src/modelfree/shaping_wrappers.py	`93.07% <93.07%> (ø)`
src/modelfree/scheduling.py	`95.55% <95.55%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 72b9b9c...47f4a2e. Read the comment docs.

AdamGleave commented 5 years ago

I've made a lot of changes to master, which should simplify implementation of some of these things (particularly adding noise to the victim), so now would be a good time to merge master again. Ping me once you've done that & got the unit tests passing and I'll do a more detailed review. In the meantime will go through and leave some high-level comments.

HumanCompatibleAI / adversarial-policies

Merging work on reward shaping and annealing #2

Codecov Report