AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Trade-off between memory and speed #2966

Open xiaohai12 opened 5 years ago

xiaohai12 commented 5 years ago

hello, I am trying to find a cheapest way to use GPU to train yolov3 of custom data. there are four choices of GPU: M60 (8G memory), V100 (16G), K80 (24G) (these three from AWS) and 1080 ti(10G) in my own machine. In the cfg file, I set batch=64 and subdivisions = 32. (if I set subdivisions=16 it will out of memory even for 16G memory). And I set CUDNN_HALF=1 in V100 server to compare them.

But the result is that 1080 ti took 2 hours to train1000 iterations and save a weights file, M60 took around 4 hours and V100 is a little bit slow the 1080 ti which is not the same with my expect. since I think V100 should faster than 1080 ti, and even we use tensor core to speed up.

I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.

So that I would like to ask whether I can set subdivisions or batch number and height, width in the cfg file to speed up training without out of memory and also cost less money.

AlexeyAB commented 5 years ago

@xiaohai12 Hi,

I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.

What error do you get?


V100 is a little bit slow the 1080 ti which is not the same with my expect. since I think V100 should faster than 1080 ti, and even we use tensor core to speed up.

image

Also usually Tensor cores requrest lower subdivisons (higher mini_batch=batch/subdivisions) for acceleration.


So that I would like to ask whether I can set subdivisions or batch number and height, width in the cfg file to speed up training without out of memory and also cost less money.

Lower subdivisions - accelerates training but lead to Out of memory. Lower width & height - accelerates training and detection but lead to worse accuracy.

xiaohai12 commented 5 years ago

@xiaohai12 Hi,

I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.

What error do you get?

Thanks for your reply, the ./darknet command sometime will broke my server(inside the docker it is broken down), in this case when I input nvidia-smi (outside the docker) it will crash and I can do nothing with that, also when I use “top” command, I saw the process 13426 (irq/129-nvidia) occupy 100% CPU. But since I am not a Server administrator so I need to ask they to reboot the server.......

Also, thanks for suggestions about cfg file.