PacktPublishing / Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
https://www.packtpub.com/big-data-and-business-intelligence/hands-intelligent-agents-openai-gym
MIT License
366 stars 149 forks source link

Deep Q-learning on Carla Env #23

Closed smiler80 closed 5 years ago

smiler80 commented 5 years ago

Hello,

I'm intending to apply the chapter 6 deep q-learner algorithm on Carla environment. I noticed that the below call is commented:

https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch6/deep_Q_learner.py#L260

At which level is the algorithm operating the gradient descent and computing the new model parameters?

When I activated the line indicated above, I got this message error

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 6, 8, 8], but got 3-dimensional input of size [6, 84, 84] instead

generated at

_functionapproximator\cnn.py", line 34, in forward x = self.layer1(x)

Could you please help?

Regards

praveen-palanisamy commented 5 years ago

Hey @smiler80 , That line is commented in this Deep Q-Learner implementation because, instead of the Q update at every step, it uses experience replay i.e, it uses a (mini) batch SGD using the experiences drawn from the replay memory. The relevant lines are quoted below: https://github.com/PacktPublishing/Hands-On-Intelligent-Agents-with-OpenAI-Gym/blob/8a334e0d11e12654ddf1418f54738e8338137c9e/ch6/deep_Q_learner.py#L286-L287

If you happened to follow the book, you will see how the Deep Q-learning agent was built from the ground up starting with the one-step updates, then later on using the experience replay. That's why the comment still exists in the code to make it easy for the readers to follow.

Hope that answers your question.

On a side note: The error you get is likely because you are using a Gray scale image as observations rather than RGB images.

smiler80 commented 5 years ago

Thank you @praveen-palanisamy.

I'm running training of deep Q-learning on CARLA and will probably be back to you.

Regards

smiler80 commented 5 years ago

Hello,

Please a general question. How could I track the convergence of a the deep q-learning algorithm? Is it just through its reward graph shape from tensorboard?

Thanks

praveen-palanisamy commented 5 years ago

Hey @smiler80 : Yes. The reward vs time-step plot is a good indicator of whether the agent algorithm has converged. You should observe a near-stable settling close to the upper-bound on the reward plot (assuming exploration/epsilon is low too).

smiler80 commented 5 years ago

Hi @praveen-palanisamy

The Deep Q-learning agent has trained over 25000 episodes in CARLA environment. However, when visualizing the vehicle behavior, I'm still noticing many random actions, obstacles not avoided (collisions), failure to follow curved trajectories... Is there a specific strategy to update below learning parameters across training (starting values, periodic updates...) in order to optimize the agent learning progress "lr" "gamma" "epsilon_max" "epsilon_min"

praveen-palanisamy commented 5 years ago

Hey @smiler80 : Do you have the tensorboard plots? It will be easier to understand and improve the training performance with the help of those plots generated by the Deep Q-Learning script.

In general, it depends on the driving scenarios but, 25000 episodes in the Carla-v0 environment is not a huge number of samples especially because the observations are raw RGB camera images (high-dimensional). So, don't loose hope yet :). The script will automatically save checkpoints when the training improves and so you can resume training from the previous best.

Lane-Keeping, bearing along curves, turning at intersections and obstacle avoidance are all complex tasks in general for the agent to learn especially from raw RGB observations. It usually takes several tens of millions of steps for the agent to learn a fairly good Q-function in this complex environment.

"lr": Currently, a configurable fixed learning rate is used but, a decaying learning rate may help improve the training performance. You can use the utils.decay_schedule.LinearDecaySchedule class to add a linear decay schedule for the learning rate (similar to the epsilon exploration factor decay in the code). Also see note below on epsilon decay.

"gamma": The value of gamma is configurable but does not usually affect the performance much. It is okay to leave it at the default value as in the repository. The Carla environment implementation provides a fairly decent reward signal at every time step.

"epsilon_max", "epsilon_min": Depending on the "epsilon_decay_final_step", you may want to adjust the max and min values over the whole training process so that the epsilon value (exploration) is quite high during the initial phases and then slowly as the agent starts to learn better Q approximations and consequently a good policy, the epsilon value can be decayed to lower values. The training script uses a linear decay schedule. You could use exponential decay or even automated schedules (using population based training, Jaderberg et. al)

DongChen06 commented 5 years ago

Hi, @praveen-palanisamy I wonder which version of Carla is supported by this Car Env? Can I try it on the newest version of carla (0.9.5)? For instance, in new versions of carla, there are some lane detection modules.

praveen-palanisamy commented 5 years ago

Hi @Derekabc : Only the stable Carla release versions (0.8.x) are supported by the Carla Env in this repository. I plan to add Carla Gym environments and agents for the Carla development versions (0.9.x +) soon as a separate project.

DongChen06 commented 5 years ago

Hi @Derekabc : Only the stable Carla release versions (0.8.x) are supported by the Carla Env in this repository. I plan to add Carla Gym environments and agents for the Carla development versions (0.9.x +) soon as a separate project.

cool! hope hear from you soon.

DongChen06 commented 5 years ago

@praveen-palanisamy I have a question about the python environment. When I try to build the carla from script, I got a problem caused by incompatible python version. When I deleted the Anaconda environment from the PATH(using the base python-3.5.2), the problem was solved. But I still want to use the conda virtual environment to run the python script, have you ever faced the problem?

praveen-palanisamy commented 5 years ago

Hey @Derekabc: Just read your last question. Sorry for the delay.

This is more of a carla question but, I will try to address it here. I haven't faced problems using carla python API in a conda environment. You didn't mention which version of carla you were using but, if you don't have the python egg/wheels built for your version of (conda) python, you will have to build it from source before you can use it within your Anaconda environment.

Hope that helps. If not, please open a new issue as it is not related to this (#23) issue.