HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions
https://arxiv.org/abs/2006.13900
Apache License 2.0
61 stars 7 forks source link

Compute and report confidence intervals in heatmaps #30

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

Compute confidence intervals using bootstrapping (all), and Student's t-distribution (CANON, EPIC). Also report sample mean and SD (CANON, EPIC) across seeds.

Visualized by saving multiple heatmaps, one for each type.

codecov[bot] commented 4 years ago

Codecov Report

Merging #30 into master will decrease coverage by 0.14%. The diff coverage is 88.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #30      +/-   ##
==========================================
- Coverage   87.98%   87.84%   -0.15%     
==========================================
  Files          55       57       +2     
  Lines        3830     3957     +127     
==========================================
+ Hits         3370     3476     +106     
- Misses        460      481      +21     
Impacted Files Coverage Δ
...s/dissimilarity_heatmaps/plot_gridworld_heatmap.py 92.30% <ø> (ø)
tests/test_canonical_sample.py 100.00% <ø> (ø)
tests/test_scripts.py 100.00% <ø> (ø)
...ysis/dissimilarity_heatmaps/plot_return_heatmap.py 92.10% <76.47%> (-5.08%) :arrow_down:
...lysis/dissimilarity_heatmaps/plot_canon_heatmap.py 93.37% <78.57%> (-3.73%) :arrow_down:
...alysis/dissimilarity_heatmaps/plot_epic_heatmap.py 93.40% <78.57%> (-2.89%) :arrow_down:
...ards/analysis/dissimilarity_heatmaps/cli_common.py 77.09% <89.87%> (+9.13%) :arrow_up:
...ewards/analysis/dissimilarity_heatmaps/heatmaps.py 90.69% <100.00%> (+0.10%) :arrow_up:
src/evaluating_rewards/canonical_sample.py 97.84% <100.00%> (-0.27%) :arrow_down:
src/evaluating_rewards/tabular.py 72.91% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 47156bc...6fb00a6. Read the comment docs.