araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
284 stars 88 forks source link

Car goes way off track during training [question] #5

Closed tleyden closed 5 years ago

tleyden commented 5 years ago

When I start training via: python train.py --algo sac -vae vae-level-0-dim-32.pkl -n 5000 and connect the donkey sim, I notice that sometimes the episode will end as soon as the car moves outside of the lane, whereas other times the car ends up in the middle of the desert for quite a while:

Screen Shot 2019-04-28 at 3 38 41 PM

and in the output console I see:

----------------------------------------
| current_lr              | 0.0003     |
| ent_coef                | 0.5942134  |
| ent_coef_loss           | 0.03155499 |
| entropy                 | 0.7693799  |
| ep_rewmean              | -10.8      |
| episodes                | 154        |
| eplenmean               | 1.17       |
| fps                     | 0          |
| mean 100 episode reward | -10.8      |
| n_updates               | 91200      |
| policy_loss             | -129.445   |
| qf1_loss                | 26.577808  |
| qf2_loss                | 18.912739  |
| time_elapsed            | 518.64     |
| total timesteps         | 498        |
| value_loss              | 46.93487   |
----------------------------------------
Episode finished. Reward: -10.61 1 Steps
SAC training duration: 2.29s
[snip..]
Episode finished. Reward: -11.25 1 Steps
SAC training duration: 2.31s
[snip..]
Episode finished. Reward: -10.03 1 Steps
SAC training duration: 2.22s
etc..

Is that the expected behavior during training or something wrong w/ my setup?

tleyden commented 5 years ago

It seems to be related to: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/algos/custom_sac.py#L129-L131

I changed it to:

                if done:
                    obs = self.env.reset()

and it was behaving more as I originally expected.

araffin commented 5 years ago

Hello, It seems you are working with an old version of the Simulator. Please use the one in the readme, or compile a recent version. Also, the max_cte_error (cross track error) is the tolerance.

tleyden commented 5 years ago

I did build sdsandbox source, but it looks like I was on the master branch, and I should be on the donkey branch. I'll give it a try with the donkey branch version. Thanks!

tleyden commented 5 years ago

It's working better now after switching to the donkey branch, but if the car goes off the road at a really tight angle, it still seems to wander around off the track for a while before the env.reset() happens.

Screen Shot 2019-04-29 at 7 54 22 AM

araffin commented 5 years ago

Ok, I think you need to monitor the cross track error and its threshold. When the cross track error is above a threshold, it should reset. The problem may come from the simulator, if it does not send the proper error, then it won't reset. Also, did you try the teleoperation mode? Did the reset work in that mode (r key)?

I also see you are working on MacOS, I hope this does not change anything. See here: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/donkey_gym/envs/donkey_sim.py#L217 for the relevant line

EDIT: it also seems you do not use the compiled version, but the one in unity. I never tried to do it like that.

tleyden commented 5 years ago

The problem may come from the simulator, if it does not send the proper error, then it won't reset.

Updating to the donkey branch of the simulator fixed this issue.