AdamGleave commented 4 years ago

hypothesis has some non-determinism and the tests are flaky due to floating point error. Use higher-precision floats to mitigate rounding error, and reducing scale of rewards that we test to further reduce rounding error. Also relax thresholds where they're overly strict.

AdamGleave commented 4 years ago

I ran pytest --flake-finder (for 100 replicas) and all passed, so tests should have at least <1% probability of failing now.

codecov[bot] commented 4 years ago

Codecov Report

Merging #20 into master will not change coverage by %. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #20   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines          463       465    +2     
=========================================
+ Hits           463       465    +2

Impacted Files	Coverage Δ
tests/test_tabular.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 07f17d4...25e3421. Read the comment docs.

HumanCompatibleAI / evaluating-rewards

Hotfix for flaky tabular tests #20

Codecov Report