PacktPublishing / Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym
MIT License
371 stars 149 forks source link

Getting actions while training A2C RL #15

Closed smiler80 closed 5 years ago

smiler80 commented 5 years ago

Hello @praveen-palanisamy

I'm now evaluating many strategies of training A2C RL for Carla. Since visual evaluation through tensorboard is not showing the expected progress of action returns, I'm checking parts of code where I could probably enhance.

For example at this level:

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/df9ab3984237b3a02998e2c3d3df482f557945f9/ch8/a2c_agent.py#L144

It seems that actions are still being sampled randomly while training, aren't they assumed to be predicted by the current policy? Did I misunderstand or miss some details?

Thanks

praveen-palanisamy commented 5 years ago

Hi @bbacem80 Good to know that you are looking into enhancements!

The actions are indeed determined based on the current policy. In that particular example of A2C, the Carla environment used has a continuous action space therefore the actions are sampled from a continuous Multivariate Gaussian distribution whose parameters are learned by the policy.

Below is few more lines of code above the line you quoted to show how the continuous-valued action is selected (sampled) from the action_distribution learned by the (current) policy. https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/df9ab3984237b3a02998e2c3d3df482f557945f9/ch8/a2c_agent.py#L140-L144

praveen-palanisamy commented 5 years ago

@bbacem80 : Did my response above answer your questions?

smiler80 commented 5 years ago

@praveen-palanisamy

Many thanks.