HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions
https://arxiv.org/abs/2006.13900
Apache License 2.0
61 stars 7 forks source link

PointMaze Transfer Learning: Use Mixture Distribution #4

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

Update to use new PolicyMixture distribution for preferences, regress and model comparison. This should result in more robust reward estimates and reward model similarity metrics.

codecov[bot] commented 4 years ago

Codecov Report

Merging #4 into master will increase coverage by 3.16%. The diff coverage is 34.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
+ Coverage    68.2%   71.36%   +3.16%     
==========================================
  Files          39       39              
  Lines        2365     2375      +10     
==========================================
+ Hits         1613     1695      +82     
+ Misses        752      680      -72
Impacted Files Coverage Δ
src/evaluating_rewards/policies.py 77.19% <ø> (-0.4%) :arrow_down:
src/evaluating_rewards/scripts/eval_policy.py 0% <ø> (ø) :arrow_up:
tests/common.py 100% <100%> (ø) :arrow_up:
src/evaluating_rewards/__init__.py 100% <100%> (ø) :arrow_up:
src/evaluating_rewards/experiments/visualize.py 22.06% <9.52%> (+4.21%) :arrow_up:
src/evaluating_rewards/envs/point_mass.py 83.23% <0%> (+1.19%) :arrow_up:
.../evaluating_rewards/scripts/visualize_pm_reward.py 84.41% <0%> (+53.24%) :arrow_up:
...luating_rewards/experiments/point_mass_analysis.py 88.63% <0%> (+59.09%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cefc21e...24e6df8. Read the comment docs.

AdamGleave commented 4 years ago

This improved stability but still significant room for improvement. Merging but will continue this theme in another PR.