Acmece / rl-collision-avoidance

Implementation of the paper "Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning"
https://arxiv.org/abs/1709.10082
326 stars 92 forks source link

Query on Training the Policy #15

Closed gwhan98 closed 3 years ago

gwhan98 commented 3 years ago

When training from scratch in stage 1, I am not able to get the policy to converge to the optimal policy, even after training for 10 hours. Is there a way to do a bit of supervised learning initially to form the basic framework for the policy, so that the DRL algorithm does not have to fly blind when training?

Thank you very much for your time.

gwhan98 commented 3 years ago

I trained with no policy at all. The training after 10 hours is mostly time outs and a bit of Reach goals sprinkled in between. Cannot converge to the optimal policy (as provided in the code).

Now that I think about it, do I have to use the Stage1_1.pth file when training from scratch? If so, can you explain what the file is? Is it some form of supervised learning? Your help is much appreciated.

Acmece commented 3 years ago

A Curriculum Learning paradigm is employed in this project, please refer to this issue

You should start with training in Stage1 and when it is well-trained you can transfer to Stage2 using the model of Stage1, this is exactly what Curriculum Learning means. Training Stage2 from scratch may converge at a lower performance or not even converge. Please note that the motivation of training in Stage2 is to generalize the model, which hopefully can work well in real environment.

BingHan0458 commented 3 years ago

Hello @gwhan98 , Recently, I've been training this code, and I see here you say that you train it for about 10 hours, and I want to know what results in Terminal or in GUI can be shown to prove that the policy has been trained well? And how to get the result such as success rate and extra time and so on in Table II (in paper) by code or by doing some calculations? I hope you can give me some advice, thank you very much!