gerkone / pyTORCS-docker

Docker-based, gym-like torcs environment with vision.
17 stars 3 forks source link

The problem of stationary vehicle and how to improve training speed #8

Closed zzlqwq closed 1 year ago

zzlqwq commented 1 year ago

I am running the ddpg example. The first two Episode car can run the track normally. It seems that they have loaded the trained parameters, but after the third Episode starts, the car has been stopped all time. The terminal is outputting

1/1 [==============================] - 0s 85ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step

I'm trying to read the code to find the problem

gerkone commented 1 year ago

Hey. Could you share some more details? Does it show anything else? Are you sure it just stops, or is it just training? Are you using tf2rl DDPG or my version? I don't remember my version having those loading bars but I could be wrong. In any case both implementations are getting a bit outdated at this point since they still use tensorflow. You could also try with newer agents and see. Let me know

zzlqwq commented 1 year ago

I am using your ddpg not tf2rl DDPG. The complete output is as follows.

[INFO]: Loaded saved actor models [INFO]: Loaded saved critic models


[INFO]: Starting 3000 episodes on track g-track-1 [INFO]: Episode 1/3000 started [INFO]: Iteration 0 --> Duration 41537.48 ms. Score 353.54. Running average 353.54 [INFO]: Episode 2/3000 started [INFO]: Iteration 1 --> Duration 62016.21 ms. Score 561.38. Running average 457.46 [INFO]: Starting training: 4 epochs over 2488 collected steps [INFO]: Completed 4 epochs. Duration 3751.46 ms. Average loss -74.656 [INFO]: Saving models...


[INFO]: Episode 3/3000 started 1/1 [==============================] - 0s 126ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 15ms/step 1/1 [==============================] - 0s 14ms/step 1/1 [==============================] - 0s 14ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 15ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 16ms/step 1/1 [==============================] - 0s 16ms/step

It seems to be in training. In the first two episodes, the agent seems to have loaded the trained network, and it can complete the race well, but after the third episodes, the car don't take any action.I will rewrite the rl algorithm with pythoch, but I'm not sure whether the problem lies in the rl algorithm or the part that interacts with the simulation environment.

gerkone commented 1 year ago

If you see the car driving properly in the first couple of episodes it's probably due the warmup phase, where a PID drives (simple_controller) instead of the RL agent https://github.com/gerkone/pyTORCS-docker/blob/d9c01a66c73f9637d93d20151ce8dc3cefd1b176/driver/agents/ddpg/ddpg.py#L120-L126

As for the loading bars honestly I am not sure. It might be that tensorflow at some point introduced loading bars to the predict function. Try something like action = self.actor.model.predict(state, verbose=0)[0] (line 123 of ddpg.py).

Other than that I think it's working as intended, you have to give it some time to train, and DDPG is not guaranteed to actually learn anything in such a complex environment with just dense networks.

Let me know if this helps

zzlqwq commented 1 year ago

Yes, you are right. It takes a while for the vehicle to learn to move forward. I have one last question. I want to know whether I can speed up the training process. I noticed that during my training, the fps is around 20, which does not occupy much of my computing resources. Do I have a way to speed up training? Thank you.

gerkone commented 1 year ago

Hey. Good to hear that. About the performance at the moment I really don't have a solution. I know it's quite slow and performance has always been an issue. Fps is capped and as far as I remember if you raise the limit it messes up the physics. Also I don't know of any alternative torcs environment that does that (maybe madras if you have a distributed algorithm, but I never got it to work that great in general). I was planning to rewrite torcs as a pybind library, remove the window and so on but never got the time to do it.

Also, could you change the title to something more fitting to this discussion? Let me know if I can help on anything else, thanks

zzlqwq commented 1 year ago

OK, thank you very much for your work and reply. I have no other questions.