PolicyMixture: combining different (stochastic) policies

HumanCompatibleAI / evaluating-rewards

Library to compare and evaluate reward functions

https://arxiv.org/abs/2006.13900

Apache License 2.0

61 stars 7 forks source link

Closed AdamGleave closed 4 years ago

AdamGleave commented 4 years ago

This PR adds a new class PolicyMixture, registered as policy type mixture, which supports combining multiple policies for inference.

The use case that inspired this is combining an expert policy with a random policy, to get something that achieves good state coverage.