AdamGleave commented 4 years ago

Update to use new PolicyMixture distribution for preferences, regress and model comparison. This should result in more robust reward estimates and reward model similarity metrics.

codecov[bot] commented 4 years ago

Codecov Report

Merging #4 into master will increase coverage by 3.16%. The diff coverage is 34.48%.

@@            Coverage Diff             @@
##           master       #4      +/-   ##
==========================================
+ Coverage    68.2%   71.36%   +3.16%     
==========================================
  Files          39       39              
  Lines        2365     2375      +10     
==========================================
+ Hits         1613     1695      +82     
+ Misses        752      680      -72

Impacted Files	Coverage Δ
src/evaluating_rewards/policies.py	`77.19% <ø> (-0.4%)`	:arrow_down:
src/evaluating_rewards/scripts/eval_policy.py	`0% <ø> (ø)`	:arrow_up:
tests/common.py	`100% <100%> (ø)`	:arrow_up:
src/evaluating_rewards/__init__.py	`100% <100%> (ø)`	:arrow_up:
src/evaluating_rewards/experiments/visualize.py	`22.06% <9.52%> (+4.21%)`	:arrow_up:
src/evaluating_rewards/envs/point_mass.py	`83.23% <0%> (+1.19%)`	:arrow_up:
.../evaluating_rewards/scripts/visualize_pm_reward.py	`84.41% <0%> (+53.24%)`	:arrow_up:
...luating_rewards/experiments/point_mass_analysis.py	`88.63% <0%> (+59.09%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cefc21e...24e6df8. Read the comment docs.

AdamGleave commented 4 years ago

This improved stability but still significant room for improvement. Merging but will continue this theme in another PR.

HumanCompatibleAI / evaluating-rewards

PointMaze Transfer Learning: Use Mixture Distribution #4

Codecov Report