AdamGleave commented 3 years ago

Add script to train expert policies in parallel, using Ray as a wrapper around imitation.scripts.expert_demos.

codecov[bot] commented 3 years ago

Codecov Report

Merging #46 into master will increase coverage by 0.23%. The diff coverage is 83.23%.

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
+ Coverage   85.14%   85.38%   +0.23%     
==========================================
  Files          65       68       +3     
  Lines        4262     4407     +145     
==========================================
+ Hits         3629     3763     +134     
- Misses        633      644      +11

Impacted Files	Coverage Δ
src/evaluating_rewards/envs/point_mass.py	`83.33% <ø> (ø)`
...ating_rewards/scripts/rewards/train_adversarial.py	`0.00% <ø> (ø)`
...ating_rewards/scripts/rewards/train_preferences.py	`97.87% <ø> (ø)`
...valuating_rewards/scripts/rewards/train_regress.py	`95.12% <ø> (ø)`
src/evaluating_rewards/scripts/train_rl.py	`0.00% <0.00%> (ø)`
src/evaluating_rewards/scripts/script_utils.py	`60.78% <71.42%> (+7.45%)`	:arrow_up:
...aluating_rewards/scripts/pipeline/train_experts.py	`85.18% <85.18%> (ø)`
...ating_rewards/analysis/distances/table_combined.py	`85.95% <100.00%> (+2.24%)`	:arrow_up:
src/evaluating_rewards/experiments/env_rewards.py	`60.00% <100.00%> (+60.00%)`	:arrow_up:
src/evaluating_rewards/scripts/distances/npec.py	`79.52% <100.00%> (-0.32%)`	:arrow_down:
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e14bb9e...92c586d. Read the comment docs.

AdamGleave commented 3 years ago

Hyperparams look good.

env | return | standard error
evaluating_rewards/PointMassLine-v0 | -40 | 2.74 
imitation/PointMazeLeftVel-v0 | -4.47 | 0.0023
imitation/PointMazeRightVel-v0 | -4.49 | 0.0026
seals/HalfCheetah-v0 | 5397 | 69
seals/Hopper-v0 | 2419 | 14.16

HumanCompatibleAI / evaluating-rewards

Add Ray train experts script #46

Codecov Report