Main conceptual change is addition of canonical_sample.py, an implementation of the new metric based on canonicalizing the reward for continuous control environments. The tabular version is https://github.com/HumanCompatibleAI/evaluating-rewards/pull/19

There are also a number of ancillary changes:

Optimization of tabular now that we're handling much larger arrays.
Rename plot_gridworld_divergence to plot_epic_heatmap and generally refactor the visualization code to be much more modular.
Addition of plot_canon_heatmap, the analog of plot_epic_heatmap for the new distance.
New tests for canonical_sample, and some additional tests in test_scripts for E2E of both the new scripts and greater coverage of the old heatmap scripts.

Codecov Report

Merging #21 into master will increase coverage by 2.20%. The diff coverage is 88.60%.

@@            Coverage Diff             @@
##           master      #21      +/-   ##
==========================================
+ Coverage   84.28%   86.48%   +2.20%     
==========================================
  Files          46       54       +8     
  Lines        3098     3486     +388     
==========================================
+ Hits         2611     3015     +404     
+ Misses        487      471      -16

Impacted Files	Coverage Δ
...nalysis/reward_figures/gridworld_reward_heatmap.py	`96.22% <ø> (ø)`
..._rewards/analysis/reward_figures/plot_pm_reward.py	`87.80% <ø> (ø)`
src/evaluating_rewards/experiments/synthetic.py	`83.62% <22.22%> (-5.17%)`	:arrow_down:
..._rewards/analysis/dissimilarity_heatmaps/config.py	`57.14% <57.14%> (ø)`
...ds/analysis/dissimilarity_heatmaps/reward_masks.py	`62.00% <62.00%> (ø)`
...s/dissimilarity_heatmaps/plot_gridworld_heatmap.py	`92.30% <88.46%> (ø)`
...ewards/analysis/dissimilarity_heatmaps/heatmaps.py	`88.88% <88.88%> (ø)`
src/evaluating_rewards/tabular.py	`72.34% <93.54%> (+12.78%)`	:arrow_up:
...analysis/dissimilarity_heatmaps/transformations.py	`95.65% <95.65%> (ø)`
...alysis/dissimilarity_heatmaps/plot_epic_heatmap.py	`96.34% <96.34%> (ø)`
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cb185d5...9768c68. Read the comment docs.

HumanCompatibleAI / evaluating-rewards

Deep implementation of new metric #21

Codecov Report