HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions

Apache License 2.0

61 stars 7 forks source link

Currently, plot_{epic,erc}_heatmap both compute EPIC and ERC distance (respectively) and then plot the results. They save the raw results, and can be used in a mode that loads previously recorded results rather than plotting, but this is not the default mode and there is no script to compute EPIC and ERC distance without producing plots as a side-effect.

This design is clunky, and makes it awkward to implement things like table_combined that tabulate (rather than plot heatmaps) of results, from multiple distance methods.

This PR introduces a new script plot_heatmap which plots a heatmap from previously saved results. The remaining functionality is moved to distances.epic and distances.erc, which compute the distance between all pairs from a set of rewards, and save aggregated results.

One design note: we save aggregated results (e.g. mean, lower CI, upper CI) rather than raw results since the method of aggregation varies between distances. For EPIC (and NPEC) it is point estimates of different seeds. But for ERC we can bootstrap directly on the returns.

There's also a variety of more minor tidying.

Codecov Report

Merging #43 into master will decrease coverage by 1.34%. The diff coverage is 89.70%.

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   86.59%   85.25%   -1.35%     
==========================================
  Files          63       65       +2     
  Lines        4304     4231      -73     
==========================================
- Hits         3727     3607     -120     
- Misses        577      624      +47

Impacted Files	Coverage Δ
src/evaluating_rewards/__init__.py	`100.00% <ø> (ø)`
.../evaluating_rewards/analysis/distances/__init__.py	`100.00% <ø> (ø)`
...luating_rewards/analysis/distances/reward_masks.py	`62.00% <ø> (ø)`
...ting_rewards/analysis/distances/transformations.py	`93.47% <ø> (ø)`
...ting_rewards/analysis/reward_figures/point_mass.py	`79.10% <ø> (ø)`
src/evaluating_rewards/distances/npec.py	`98.36% <ø> (ø)`
src/evaluating_rewards/policies/monte_carlo.py	`0.00% <ø> (ø)`
src/evaluating_rewards/scripts/script_utils.py	`53.33% <0.00%> (ø)`
...wards/analysis/distances/plot_gridworld_heatmap.py	`89.56% <25.00%> (ø)`
...luating_rewards/analysis/distances/plot_heatmap.py	`76.69% <76.69%> (ø)`
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update ac6b9b4...a3a894c. Read the comment docs.

HumanCompatibleAI / evaluating-rewards

Separate distance computation and plotting scripts #43

Codecov Report