Closed AdamGleave closed 4 years ago
Merging #22 into master will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #22 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 11 11
Lines 540 552 +12
=========================================
+ Hits 540 552 +12
Impacted Files | Coverage Δ | |
---|---|---|
tests/test_canonical_sample.py | 100.00% <100.00%> (ø) |
|
tests/test_comparisons.py | 100.00% <100.00%> (ø) |
|
tests/test_rewards.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 796a545...5de690c. Read the comment docs.
Previously, the discount rate of potential shaping in reward models was specified on an ad-hoc basis, typically in the constructor arguments for relevant reward models. This was problematic for two reasons:
serialize.load_reward
had a fixed discount rate.This PR addresses this issue by adding a
discount
property andset_discount
function toRewardModel
, and adding adiscount
parameter toload_reward
. It is backward compatible: the default implementations are no-ops, suitable for any reward model that does not include explicit shaping.If I were to rearchitect this from scratch, I'd probably favour adding
discount
to the constructor arguments ofRewardModel
. This would not be too hard to do with the current code. Apart from changing theRewardModel
classes, it would also require changingevaluating_rewards.serialize
, and would break the API compatibility withimitation
for serialization. Given this I'll punt this for some possibly-future rearchitect, e.g. if we ever merge this withimitation
or switch to TF2/PyTorch.