Open Bahador-Bakhshi opened 3 years ago
In addition to the random policy, it is better to use random states in the initial exploration phase.
But how to implement it?
In addition to the random policy, it is better to use random states in the initial exploration phase.
But how to implement it?