HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions
https://arxiv.org/abs/2006.13900
Apache License 2.0
61 stars 7 forks source link

PolicyMixture: combining different (stochastic) policies #3

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

This PR adds a new class PolicyMixture, registered as policy type mixture, which supports combining multiple policies for inference.

The use case that inspired this is combining an expert policy with a random policy, to get something that achieves good state coverage.