Closed zzlqwq closed 1 year ago
Hey. Could you share some more details? Does it show anything else? Are you sure it just stops, or is it just training? Are you using tf2rl DDPG or my version? I don't remember my version having those loading bars but I could be wrong. In any case both implementations are getting a bit outdated at this point since they still use tensorflow. You could also try with newer agents and see. Let me know
I am using your ddpg not tf2rl DDPG. The complete output is as follows.
[INFO]: Loaded saved actor models [INFO]: Loaded saved critic models
[INFO]: Starting 3000 episodes on track g-track-1 [INFO]: Episode 1/3000 started [INFO]: Iteration 0 --> Duration 41537.48 ms. Score 353.54. Running average 353.54 [INFO]: Episode 2/3000 started [INFO]: Iteration 1 --> Duration 62016.21 ms. Score 561.38. Running average 457.46 [INFO]: Starting training: 4 epochs over 2488 collected steps [INFO]: Completed 4 epochs. Duration 3751.46 ms. Average loss -74.656 [INFO]: Saving models...
[INFO]: Episode 3/3000 started 1/1 [==============================] - 0s 126ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 15ms/step 1/1 [==============================] - 0s 14ms/step 1/1 [==============================] - 0s 14ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 15ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 19ms/step 1/1 [==============================] - 0s 18ms/step 1/1 [==============================] - 0s 16ms/step 1/1 [==============================] - 0s 16ms/step
It seems to be in training. In the first two episodes, the agent seems to have loaded the trained network, and it can complete the race well, but after the third episodes, the car don't take any action.I will rewrite the rl algorithm with pythoch, but I'm not sure whether the problem lies in the rl algorithm or the part that interacts with the simulation environment.
If you see the car driving properly in the first couple of episodes it's probably due the warmup phase, where a PID drives (simple_controller
) instead of the RL agent
https://github.com/gerkone/pyTORCS-docker/blob/d9c01a66c73f9637d93d20151ce8dc3cefd1b176/driver/agents/ddpg/ddpg.py#L120-L126
As for the loading bars honestly I am not sure. It might be that tensorflow at some point introduced loading bars to the predict
function. Try something like action = self.actor.model.predict(state, verbose=0)[0]
(line 123 of ddpg.py).
Other than that I think it's working as intended, you have to give it some time to train, and DDPG is not guaranteed to actually learn anything in such a complex environment with just dense networks.
Let me know if this helps
Yes, you are right. It takes a while for the vehicle to learn to move forward. I have one last question. I want to know whether I can speed up the training process. I noticed that during my training, the fps is around 20, which does not occupy much of my computing resources. Do I have a way to speed up training? Thank you.
Hey. Good to hear that. About the performance at the moment I really don't have a solution. I know it's quite slow and performance has always been an issue. Fps is capped and as far as I remember if you raise the limit it messes up the physics. Also I don't know of any alternative torcs environment that does that (maybe madras if you have a distributed algorithm, but I never got it to work that great in general). I was planning to rewrite torcs as a pybind library, remove the window and so on but never got the time to do it.
Also, could you change the title to something more fitting to this discussion? Let me know if I can help on anything else, thanks
OK, thank you very much for your work and reply. I have no other questions.
I am running the ddpg example. The first two Episode car can run the track normally. It seems that they have loaded the trained parameters, but after the third Episode starts, the car has been stopped all time. The terminal is outputting
1/1 [==============================] - 0s 85ms/step 1/1 [==============================] - 0s 12ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step 1/1 [==============================] - 0s 13ms/step
I'm trying to read the code to find the problem