HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions
https://arxiv.org/abs/2006.13900
Apache License 2.0
61 stars 7 forks source link

Make state/action distribution configurable in `plot_canon_heatmap` #23

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

The random sample and mesh-based methods of computing CANON require sampling from a distribution over observations and actions. Previously this was hardcoded in plot_canon_heatmap as randomly sampling from the Gym observation/action space. This is an extremely poor method especially in high-dimensional environments, where very few samples will be in any way physically realistic.

This PR makes this configurable in plot_canon_heatmap, in particular adding support for taking observations/actions from rollouts of a policy. To support this, the PR adds several methods to datasets, introducing a notion of a SampleDistFactory that is analogous to a DatasetFactory.

codecov[bot] commented 4 years ago

Codecov Report

Merging #23 into master will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #23   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           11        11           
  Lines          552       555    +3     
=========================================
+ Hits           552       555    +3     
Impacted Files Coverage Δ
tests/test_canonical_sample.py 100.00% <100.00%> (ø)
tests/test_scripts.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 78fcfef...507e6d7. Read the comment docs.