Closed young-geng closed 7 months ago
Hello Young, thanks for sending this in, I've been meaning to fix this issue!
Would it make sense to incorporate an option for choosing either stochastic or deterministic evaluation, so that we can continue to support the original variant of the task via the same task id, and register the stochastic version with a new task id?
For example, the stochastic version can be HopperController-Exact-v1.
Fixed HopperController evaluation by using stochastic policy and average each policy evaluation with 10 rollouts.