Open xiaohai12 opened 5 years ago
@xiaohai12 Hi,
I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.
What error do you get?
V100 is a little bit slow the 1080 ti which is not the same with my expect. since I think V100 should faster than 1080 ti, and even we use tensor core to speed up.
Loaded > 0 seconds
GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1
and run training with flag -dont_show
. OpenCV is required to avoid bottleneck on CPU for data augmentation.Also usually Tensor cores requrest lower subdivisons (higher mini_batch=batch/subdivisions
) for acceleration.
So that I would like to ask whether I can set subdivisions or batch number and height, width in the cfg file to speed up training without out of memory and also cost less money.
Lower subdivisions - accelerates training but lead to Out of memory. Lower width & height - accelerates training and detection but lead to worse accuracy.
random=0
and lower subdivisons, so it can lead to slightly worse accuracy ~1%, but higer training speed. batch=64 subdivisions=16
or batch=60 subdivisions=20
@xiaohai12 Hi,
I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.
What error do you get?
Thanks for your reply, the ./darknet command sometime will broke my server(inside the docker it is broken down), in this case when I input nvidia-smi (outside the docker) it will crash and I can do nothing with that, also when I use “top” command, I saw the process 13426 (irq/129-nvidia) occupy 100% CPU. But since I am not a Server administrator so I need to ask they to reboot the server.......
Also, thanks for suggestions about cfg file.
hello, I am trying to find a cheapest way to use GPU to train yolov3 of custom data. there are four choices of GPU: M60 (8G memory), V100 (16G), K80 (24G) (these three from AWS) and 1080 ti(10G) in my own machine. In the cfg file, I set
batch=64
andsubdivisions = 32
. (if I setsubdivisions=16
it will out of memory even for 16G memory). And I setCUDNN_HALF=1
in V100 server to compare them.But the result is that 1080 ti took 2 hours to train1000 iterations and save a weights file, M60 took around 4 hours and V100 is a little bit slow the 1080 ti which is not the same with my expect. since I think V100 should faster than 1080 ti, and even we use tensor core to speed up.
I should use my 1080 ti GPU but I have no idea why it is always broken down when I run yolo code, so that I have to chose one in AWS which costs lot of money.
So that I would like to ask whether I can set
subdivisions
orbatch number
andheight, width
in the cfg file to speed up training without out of memory and also cost less money.