Adjustment to accelerate the training process

IDSIA / hhmarl_2D

Heterogeneous Hierarchical Multi Agent Reinforcement Learning for Air Combat

59 stars 12 forks source link

Adjustment to accelerate the training process #8

Closed arounderor closed 4 months ago

arounderor commented 5 months ago

Hello, I'm trying to implement the project, however the training process is too slowly. Level 1 took about 5 days in my PC. I don't make changes and just follow the instruction. (Windows 11 Pro CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.59 GHz 6 cores Memory:16.0GB)

I want to know whether the training rate is normal.
If I need to complete the whole training in 2 weeks, do I have some methods? Can I just stop in the middle of the training and take the intermediate "checkpoint" to continue trainiing the next level? I appreciate your response.

ardian-selmonaj commented 4 months ago

Hi, I think the long training times are explainable by using Windows. Ray 2.4.0 under Windows is still in Beta, as far as I know. This might affect performance. Under Linux, Level 1 takes less than 5 hours, but you can stop much earlier because Level 1 converges very fast. And you can stop training anytime you want and continue with the new level. Checkpoints are saved every 50 iterations. You can modify this in train_hetero.py line 287.

arounderor commented 4 months ago

Thank you for your explanation!

arounderor commented 4 months ago

While, I try it on ubuntu22.04 as a virtual machine, and it is still as slow as on windows. I maximized the VM resources as much as possible. It seems I have to continue the simplified training plan as mentioned before.

ardian-selmonaj commented 4 months ago

I suggest to modify the argument num_workers. Try 0, 1, 2, 4, 8, 12 and have a look at the avg. episode time for the first 10 training iterations. This is the only solution I can provide since you are on Ubuntu. On my machine, I get an avg. time of 7 sec per iteration in Level1 with 4 workers [see below].

lv1

arounderor commented 4 months ago

Thank you for you suggestion. I tried different 'num_workers', but the best ave.time was still more than 40s. When I set 'num_workers' to 8, it keeps warn me that the resource schedule request failed, I think the reason is my PC only have 4 cpu cores and they are all allocated to 'actor'. At last, I wonder what OS (or other considerations) do researchers usually use referring to this kind of DRL project. Appreciate for your patience and help.

arounderor commented 4 months ago

Because I want to evaluate the effect of different scenarios in your project, but I can't get the complete training results in a short time and I have encountered some problems in the high-level training. I wonder if I can get the well-trained strategy files from you for academic purposes. I would appreciate it if I could! Attached with my personal email address 630828553@qq.com

ardian-selmonaj commented 4 months ago

pre_trained.zip Find attached pre trained low-level polices and commander. The folder "checkpoint" includes the files for commander. Place the folders/files according to the scripts and you can evaluate.