araffin / learning-to-drive-in-5-minutes

Implementation of reinforcement learning approach to make a car learn to drive smoothly in minutes
https://towardsdatascience.com/learning-to-drive-smoothly-in-minutes-450a7cdb35f4
MIT License
287 stars 85 forks source link

Robust model on multiple tracks [question] #7

Open tleyden opened 5 years ago

tleyden commented 5 years ago

The models I've trained seemed to do much better on certain tracks as compared to others. I was thinking that a training loop over randomly generated tracks would do better since it would be robust to the particular track. For example, the input to the training loop would be the number of tracks to complete (n), and then:

Setting n to a high enough value should produce a model that would work on a high percentage of randomly generated tracks.

Is there already a way to do this in train.py or would you accept a PR w/ an enhancement?

araffin commented 5 years ago

Hello,

There is no way currently of doing that and I would definitely appreciate a PR that enables that =). I also tried to fix the seed a while ago but couldn't make it work (apparently you succeeded on your side, no?). The original idea of the project was to use the method on a real RC car, that's also why I did not spend much time in training on n different tracks.

araffin commented 5 years ago

For the guidelines, please look at the ones in stable baselines, I use the same here ;)

tleyden commented 5 years ago

Ok great! Yes I managed to get the simulator to regenerate new tracks, see https://github.com/tawnkramer/sdsandbox/issues/24 + https://github.com/tawnkramer/sdsandbox/pull/26. The quick fix is to get the changes from https://github.com/araffin/learning-to-drive-in-5-minutes/pull/8 and call regen_road(rand_seed=int(time.time())) and the road regeneration should work.

Any suggestions on how the interface should look on a training loop over multiple tracks?

I don't see an easy way to be notified by the simulator when the car completed a track, so maybe a timestep approach is simpler.

For example add a --num-unique-tracks (default=1) parameter and then change --n-timesteps to --n-timesteps-per-track, and the then it would loop the training over --num-unique-tracks, and only move onto the next track when it hit --n-timesteps-per-track timestamps.