araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
284 stars 88 forks source link

Training SAC with raw image as input #25

Open ChunJyeBehBeh opened 4 years ago

ChunJyeBehBeh commented 4 years ago

The policy that I have tried is DDPG and SAC. I used master branch and below is the two command to reproduce the error. python train.py --algo sac -n 5000 python train.py --algo ddpg -n 5000

Now I want to try using raw image as input. I have set N_COMMAND_HISTORY to zero. I use the master branch. For the first 300 steps, the steering and throttle will be varied between -1 and 1 because of the sampling random action. https://github.com/araffin/learning-to-drive-in-5-minutes/blob/fb82bc77593605711289e03f95dcfb6d3ea9e6c3/algos/custom_sac.py#L89

But after that, the policy will keep output the extreme value either 1 or 1 for the steering value. So the donkey car will go out the lane quickly and it will keep repeat without showing any learning progress.

The image below showed that the episode step drop 95 to 50 after the policy start to output the action. image

Below is the plot of throttle value output [SAC with raw image input]. It keep constant at 1 after few episode. Figure_1

Below is the plot of throttle value output [SAC with vae input]. The model tried to learn how to steer and vary the output between -1 and 1. Figure_1

Sorry for keep open issues.

araffin commented 4 years ago

hello, what policy are you using? please fill the issue template completely

ChunJyeBehBeh commented 4 years ago

The policy that I used is DDPG and SAC. I have updated on the issue above. Thanks for your reply~

araffin commented 4 years ago

I wanted to say "policy architecture", it seems that you are not using a CNN if you are using the default hyperparameters... This explains your results.

ChunJyeBehBeh commented 4 years ago

Yes I am using the default hyperparameters.... May I know which part should I change in order to using raw image to train a SAC model?

In the sac.yml, change the policy from policy: 'MlpPolicy' to policy: 'CnnPolicy' ?

araffin commented 4 years ago

I would recommend you to read stable-baselines documentation and look at the rl zoo, you have plenty of examples of RL with images.

ChunJyeBehBeh commented 4 years ago

Hello, I change the policy to CnnPolicy and increase the layer to policy_kwargs: “dict(layers=[64,64,64,64])”. However, I still didn't manage to train the agent with raw image input... Any other parameters that I miss out?

Adnan-annan commented 3 years ago

@ChunJyeBehBeh did you manage to train without VAE ?

eliork commented 3 years ago

@ChunJyeBehBeh @Adnan-annan I am also trying to train without VAE. Did you have any success yet? would you mind sharing your results and methods you've tried?