Closed FazelYU closed 2 years ago
instead of choosing the action with maximum Q-value, have a distribution over actions and choose accordingly.
what is the point of that? needs further research. Closed for now.
instead of choosing the action with maximum Q-value, have a distribution over actions and choose accordingly.