Closed MXD6 closed 2 years ago
Hey MaXiaodong!
Thanks for your question. I'm not 100% sure I fully understand where you are coming from, but I very much agree IMPALA still has various issues, among them lack of exploration.
The typical means of making agents explore more when using actor-critic methods is to increase the entropy cost hyperparameter. That will make the policy less likely to become very "peaky" (i.e., make the policy closer to a uniform random distribution). This will make the agent "explore" more in the sense of trying out random actions during training.
Your help is useful to me. Thank you!
I found that IMPALA has a shortcoming that the agent does not explore. Do you know how IMPALA joins the exploration mechanism? Thank you!!