HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions
https://arxiv.org/abs/2006.13900
Apache License 2.0
61 stars 7 forks source link

Add Ray train experts script #46

Closed AdamGleave closed 3 years ago

AdamGleave commented 3 years ago

Add script to train expert policies in parallel, using Ray as a wrapper around imitation.scripts.expert_demos.

codecov[bot] commented 3 years ago

Codecov Report

Merging #46 into master will increase coverage by 0.23%. The diff coverage is 83.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
+ Coverage   85.14%   85.38%   +0.23%     
==========================================
  Files          65       68       +3     
  Lines        4262     4407     +145     
==========================================
+ Hits         3629     3763     +134     
- Misses        633      644      +11     
Impacted Files Coverage Δ
src/evaluating_rewards/envs/point_mass.py 83.33% <ø> (ø)
...ating_rewards/scripts/rewards/train_adversarial.py 0.00% <ø> (ø)
...ating_rewards/scripts/rewards/train_preferences.py 97.87% <ø> (ø)
...valuating_rewards/scripts/rewards/train_regress.py 95.12% <ø> (ø)
src/evaluating_rewards/scripts/train_rl.py 0.00% <0.00%> (ø)
src/evaluating_rewards/scripts/script_utils.py 60.78% <71.42%> (+7.45%) :arrow_up:
...aluating_rewards/scripts/pipeline/train_experts.py 85.18% <85.18%> (ø)
...ating_rewards/analysis/distances/table_combined.py 85.95% <100.00%> (+2.24%) :arrow_up:
src/evaluating_rewards/experiments/env_rewards.py 60.00% <100.00%> (+60.00%) :arrow_up:
src/evaluating_rewards/scripts/distances/npec.py 79.52% <100.00%> (-0.32%) :arrow_down:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e14bb9e...92c586d. Read the comment docs.

AdamGleave commented 3 years ago

Hyperparams look good.

env | return | standard error
evaluating_rewards/PointMassLine-v0 | -40 | 2.74 
imitation/PointMazeLeftVel-v0 | -4.47 | 0.0023
imitation/PointMazeRightVel-v0 | -4.49 | 0.0026
seals/HalfCheetah-v0 | 5397 | 69
seals/Hopper-v0 | 2419 | 14.16