Closed tleyden closed 5 years ago
It seems to be related to: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/algos/custom_sac.py#L129-L131
I changed it to:
if done:
obs = self.env.reset()
and it was behaving more as I originally expected.
Hello, It seems you are working with an old version of the Simulator. Please use the one in the readme, or compile a recent version. Also, the max_cte_error (cross track error) is the tolerance.
I did build sdsandbox source, but it looks like I was on the master branch, and I should be on the donkey branch. I'll give it a try with the donkey branch version. Thanks!
It's working better now after switching to the donkey branch, but if the car goes off the road at a really tight angle, it still seems to wander around off the track for a while before the env.reset()
happens.
Ok, I think you need to monitor the cross track error and its threshold. When the cross track error is above a threshold, it should reset. The problem may come from the simulator, if it does not send the proper error, then it won't reset. Also, did you try the teleoperation mode? Did the reset work in that mode (r key)?
I also see you are working on MacOS, I hope this does not change anything. See here: https://github.com/araffin/learning-to-drive-in-5-minutes/blob/c46338cfbfd7b316b1992247c302783a8cb6d36a/donkey_gym/envs/donkey_sim.py#L217 for the relevant line
EDIT: it also seems you do not use the compiled version, but the one in unity. I never tried to do it like that.
The problem may come from the simulator, if it does not send the proper error, then it won't reset.
Updating to the donkey branch of the simulator fixed this issue.
When I start training via:
python train.py --algo sac -vae vae-level-0-dim-32.pkl -n 5000
and connect the donkey sim, I notice that sometimes the episode will end as soon as the car moves outside of the lane, whereas other times the car ends up in the middle of the desert for quite a while:and in the output console I see:
Is that the expected behavior during training or something wrong w/ my setup?