How does IMPALA join the exploration mechanism?

MXD6 commented 2 years ago

I found that IMPALA has a shortcoming that the agent does not explore. Do you know how IMPALA joins the exploration mechanism? Thank you!!

heiner commented 2 years ago

Hey MaXiaodong!

Thanks for your question. I'm not 100% sure I fully understand where you are coming from, but I very much agree IMPALA still has various issues, among them lack of exploration.

The typical means of making agents explore more when using actor-critic methods is to increase the entropy cost hyperparameter. That will make the policy less likely to become very "peaky" (i.e., make the policy closer to a uniform random distribution). This will make the agent "explore" more in the sense of trying out random actions during training.

MXD6 commented 2 years ago

Your help is useful to me. Thank you！

facebookresearch / torchbeast

How does IMPALA join the exploration mechanism? #34