bitsauce / Carla-ppo

This repository hosts a customized PPO based agent for Carla. The goal of this project is to make it easier to interact with and experiment in Carla with reinforcement learning based agents -- this, by wrapping Carla in a gym like environment that can handle custom reward functions, custom debug output, etc.
MIT License
228 stars 56 forks source link

Training Problem #19

Closed Oroduin closed 2 years ago

Oroduin commented 3 years ago

Hi, bitsause. I have trained a new agent in both synchronous and asynchronous environments, after training about 2k episodes, but I can't get any decent outcome as you said in your paper. The reward is floating as this: image So I carefully checked the training process. And I found some episodes stopped training before the car started running in both synchronous and asynchronous environments. Most of them got rewards: -10 as the data graph showing. I think they are invalid training. Especially, the frame of the server is just slightly ahead of the client's(around 40:30) in an asynchronous setting. This may be a hardware issue. But it can't explain all. I trained agents in Town07, Carla 0.9.5. Can you explain the reasons for me?

bitsauce commented 3 years ago

Hi @Oroduin

Yeah, the car getting stuck at the start of the episode sometimes is quite annoying, and should ideally be fixed. In my case, it didn't really happen that often, so it didn't greatly affect the training. Looking at your graph it seems to be happening quite frequently? My best guess is that this comes from a difference in hardware, which ideally shouldn't be the case when running in synchronous mode as it should be deterministic, but doesn't seem to be, unfortunately.

One thing that could be worth considering is trying to upgrade CARLA to see if that resolves any of these issues. Otherwise, we have to dig into the code and try to find out why this is happening, which is something I haven't had time to look into after I finished writing about the project

Oroduin commented 3 years ago

Hi, @bitsauce

I think your second piece of advice is very useful. I agree that the result comes from hardware and code issues. My hardware is not so good actually. I think I should extend the reset time of the car based on 2 seconds might be a little short for my condition and use synchronous mode instead.

Thank you for your help!

BestPolarBear commented 3 years ago

Hello, oroduin and bitsauce. I also encountered the problem you described. I guess it is because of the synchronization mode setting. During the observation and training, I found that the server is running and the client is still not running. settings.synchronous_ Mode = true enables the simulation update to wake up through this client, but this does not guarantee that it will wait for other processes of this client to run, so it must add another queue to block it. It's just my guess.

Oroduin commented 3 years ago

Hello, oroduin and bitsauce. I also encountered the problem you described. I guess it is because of the synchronization mode setting. During the observation and training, I found that the server is running and the client is still not running. settings.synchronous_ Mode = true enables the simulation update to wake up through this client, but this does not guarantee that it will wait for other processes of this client to run, so it must add another queue to block it. It's just my guess.

Hi, @BestPolarBear . My problem has been solved by upgrading my hardware. The Carla has a high demand about hardware. My friends encountered the same problem but better than me. I guess the reason is he has a better hardware than me, so I decided to upgrade my hardware including GPU and RAM. After upgrading the GPU, all of modes can be trained although the fps is not good as the author (mine is less than 30fps).
And the client can catch up with server every time in my observation.

BestPolarBear commented 3 years ago

image Hi@Oroduin. This is my training result. I think this model is not convergent. Can I have a look at your training result? My GPU is two 2080

Oroduin commented 3 years ago

Hi, @BestPolarBear. It seems that your GPU is good enough to train. In fact, my result is similar as yours and I didn't wait for it converged. Because the time and I test the result around 2.5K episodes, and the performance is quite well. I suggest you can test your model to see the performance of the agent and I guess the result will converge after 10k episodes in asynchronous mode (synchronous mode will take longer time). If you minify the reward, I guess you might see like this (asynchronous): image If not, we can discuss again. Lastly, you should not expect the training effect as good as authors.

bitsauce commented 3 years ago

@Oroduin

After upgrading the GPU, all of modes can be trained although the fps is not good as the author (mine is less than 30fps). Yeah, that's likely because the videos that are generated by the eval script are artificially sped up to 30 FPS to make for a smoother viewing experience. You can actually see the real update rate in the top-left corner (~11 FPS in the first example).

The graph you show is pretty similar to what I remember getting most of the time towards the end of the project! I do recall kind of struggling a bit to get the same results twice in a row, which might explain why your results aren't as good. If there's one thing I wish I had more time to do for this project it would be to make sure the training is actually deterministic, which it doesn't seem to be as it currently is. I'm guessing this is probably a result of the simulator itself not being deterministic (the simulator runs in Unreal which is a massive beast with tones of threading); or it's a misunderstanding on my part regarding what the benchmark flag does (at least for CARLA 0.9.5).

@BestPolarBear My recommendation would be to try both synchronous and asynchronous modes and checking if any work. If not, it might be that your hardware is too performant compared to what I used (I used a GTX 970), giving you input data that is possibly too granular compared to what I had. If that is the case, you could also try to set the -fps parameter to force a specific FPS in synchronous mode, but be aware that I haven't explored the effects of this option extensively

Oroduin commented 3 years ago

@bitsauce Thank you for your reply. I found I misunderstood the frame rate after going back to check the videos. The result is reasonable compared with first example. And I trained agent in asynchronous mode, the car is driving more fiercely than synchronous mode but not so shaky as yours (maybe it's just I trained for longer time).