andy-yun / pytorch-0.4-yolov3

Yet Another Implimentation of Pytroch 0.4.1 and YoloV3 on python3
MIT License
279 stars 72 forks source link

How to use multi GPUs? #19

Closed ehsanfathi77 closed 6 years ago

ehsanfathi77 commented 6 years ago

I successfully started training but it is just using one GPU. is there a anything except the data file that I need to change?

andy-yun commented 6 years ago

You should update gpus in your data as gpus = 0,1,2,3

ehsanfathi77 commented 6 years ago

I have that in my data. But still it just uses one GPU. were you able to utilize 4 GPUs with this implementation?

andy-yun commented 6 years ago

For example, cfg/coco.data includes as follows: train = coco_train.txt valid = coco_test.txt names = data/coco.names backup = backup gpus = 0,1,2,3

What's your training machine? linux or windows? In linux system, the above option is successfully applied and two GPUs are working.

ehsanfathi77 commented 6 years ago

It is a linux machine with 4 GPUs.

I have the same settings, gpu = 0,1,2,3. but it is just using third gpu with ~65% utilization.

Moreover it is just using one CPU for batch preparing. is your implementation using multiple CPUs too?

andy-yun commented 6 years ago

@ehsanfathi77 I am sorry that I did not test on 4 GPUs and multiple CPUs. I just ran the model on 2 GPUs. Even though multiple GPUs are used, the load is not balanced in my opinion. This is not my problem I think. When I use 3 or more GPUs, I will check this case. Until then, please endures.. Sorry.

ehsanfathi77 commented 6 years ago

we need to change the batch size in the yolo_v3.cfg. It is in the testing mode. So, uncommenting the training lines and commenting the test lines did the trick. it works pretty good on 4 GPUs and uses several CPUs but as you mentioned the load is not balanced on GPUs.

jaelim commented 5 years ago

@andy-yun strangely even though I specified a number of gpus to use in .data file, I saw that my training utilized all available GPU cards. It's strange and wanting to ask if anyone has experienced this?