Add experiments for badly-behaved (or "pathological") visitation distributions to highlight failure modes of EPIC.

Concrete changes:

epic_sample.py now takes a batch of observations and actions for canonicalization directly, rather than taking distributions and sampling the batch itself. This required modifications to scripts.distances.epic and in test code.
scripts.distances.epic now parallelizes across seeds using Ray.
table_combined supports loading values from multiple runs and merging them.
Added some "decoy" rewards for PointMaze to envs.mujoco.
Added some "pathological" visitation distributions into table_combined.
Bugfix in NPEC: put source and target in correct order. (This not just flipped the x-y axis but also normalized by the wrong entry.)

Codecov Report

Merging #48 (63477ab) into master (e14bb9e) will decrease coverage by 1.55%. The diff coverage is 75.21%.

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
- Coverage   85.14%   83.59%   -1.56%     
==========================================
  Files          65       68       +3     
  Lines        4262     4583     +321     
==========================================
+ Hits         3629     3831     +202     
- Misses        633      752     +119

Impacted Files	Coverage Δ
src/evaluating_rewards/distances/common_config.py	`94.73% <ø> (ø)`
src/evaluating_rewards/envs/point_mass.py	`83.33% <ø> (ø)`
src/evaluating_rewards/scripts/distances/common.py	`91.85% <ø> (ø)`
...ating_rewards/scripts/rewards/train_adversarial.py	`0.00% <ø> (ø)`
...ating_rewards/scripts/rewards/train_preferences.py	`97.87% <ø> (ø)`
...valuating_rewards/scripts/rewards/train_regress.py	`95.12% <ø> (ø)`
src/evaluating_rewards/scripts/train_rl.py	`0.00% <0.00%> (ø)`
src/evaluating_rewards/analysis/results.py	`37.61% <25.00%> (+4.58%)`	:arrow_up:
src/evaluating_rewards/envs/mujoco.py	`88.09% <54.54%> (-10.07%)`	:arrow_down:
src/evaluating_rewards/datasets.py	`59.86% <58.82%> (-10.88%)`	:arrow_down:
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d8256b3...63477ab. Read the comment docs.

HumanCompatibleAI / evaluating-rewards

Pathological visitation distributions #48

Codecov Report