Acmece / rl-collision-avoidance

Implementation of the paper "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning"
https://arxiv.org/abs/1709.10082
326 stars 92 forks source link

How to train a new model from scratch to pass the test? Is it sufficient to train in stage 2 only? #3

Closed chesternimiz closed 4 years ago

chesternimiz commented 4 years ago

The provided model stage2.pth works well in the test, but I am confused about how to train such a model from scratch? Should I train in only stage 2, only stage 1, or even randomly train in both stage 1 and stage 2 for some episodes, which I think may not converge?

Currently I am trying to train in stage 2 only, since this world looks complex and diverse. I have tried training for several days. But the robots still collide in the test. Is it right to train in stage 2 only?

Working on longer training by loading newest model. I'd appreciate it if anyone can give me some advice.

Acmece commented 4 years ago

Hi,@chesternimiz , You should start with training in Stage1 and when it is well-trained you can transfer to Stage2 using the model of Stage1, this is exactly what Curriculum Learning means. Training Stage2 from scratch may converge at a lower performance or not even converge. Please note that the motivation of training in Stage2 is to generalize the model, which hopefully can work well in real environment.

ding15963 commented 4 years ago

Hello, @Acmece , I trained in Stage1 for a week starting from the stage1_1.pth model, and each Agent has about 60,000 EPISODES.Based on this model, we continued to train for 5 days in Stage2, and each Agent had about 35,000 EPISODES. However, the effect of the test still does not reach the stage2.pth model already provided, and there are still many differences.

I would like to ask how did you train this stage2.pth model, do you need to adjust some hyperparameters? I will be very grateful!

In addition, due to ”segmentation error“, I added gc.collect () in each loop to clear the memory.

BingHan0458 commented 3 years ago

Hello @ding15963 , Recently, I've been training this code, and I see here you say that you train it for about a week, and I want to know what results in Terminal or in GUI can be shown to prove that the policy has been trained well? And how to get the result such as success rate and extra time and so on in Table II (in paper) by code or by doing some calculations? I hope you can give me some advice, thank you very much!