While i train custom model, I have CUDA error

jjy0317 commented 3 years ago

I'm training my custom model using YOLO v3 in Windows10. But i faced some problem. When I enter code "darknet.exe detector train data\obj.data yolov3.cfg data\darknet53.conv.74" in cmd i have that error message.

CUDA-version: 10020 (10020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 3.4.10 yolov3 0 : compute_capability = 500, cudnn_half = 0, GPU: GeForce GTX 750 net.optimized_memory = 0 mini_batch = 1, batch = 64, time_steps = 1, train = 1 layer filters size/strd(dil) input output 0 Create CUDA-stream - 0 Create cudnn-handle 0 conv 24 3 x 3/ 1 160 x 160 x 3 -> 160 x 160 x 24 0.033 BF 0 - receptive field: 3 x 3 1 conv 64 3 x 3/ 2 160 x 160 x 24 -> 80 x 80 x 64 0.177 BF 1 - receptive field: 5 x 5 2 conv 32 1 x 1/ 1 80 x 80 x 64 -> 80 x 80 x 32 0.026 BF 2 - receptive field: 5 x 5 3 conv 64 3 x 3/ 1 80 x 80 x 32 -> 80 x 80 x 64 0.236 BF 3 - receptive field: 9 x 9 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 80 x 80 x 64 0.000 BF 4 - receptive field: 9 x 9 5 conv 128 3 x 3/ 2 80 x 80 x 64 -> 40 x 40 x 128 0.236 BF 5 - receptive field: 13 x 13 6 conv 64 1 x 1/ 1 40 x 40 x 128 -> 40 x 40 x 64 0.026 BF 6 - receptive field: 13 x 13 7 conv 128 3 x 3/ 1 40 x 40 x 64 -> 40 x 40 x 128 0.236 BF 7 - receptive field: 21 x 21 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 40 x 40 x 128 0.000 BF 8 - receptive field: 21 x 21 9 conv 64 1 x 1/ 1 40 x 40 x 128 -> 40 x 40 x 64 0.026 BF 9 - receptive field: 21 x 21 10 conv 128 3 x 3/ 1 40 x 40 x 64 -> 40 x 40 x 128 0.236 BF 10 - receptive field: 29 x 29 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 40 x 40 x 128 0.000 BF 11 - receptive field: 29 x 29 12 conv 256 3 x 3/ 2 40 x 40 x 128 -> 20 x 20 x 256 0.236 BF 12 - receptive field: 37 x 37 13 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 13 - receptive field: 37 x 37 14 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 14 - receptive field: 53 x 53 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 15 - receptive field: 53 x 53 16 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 16 - receptive field: 53 x 53 17 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 17 - receptive field: 69 x 69 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 18 - receptive field: 69 x 69 19 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 19 - receptive field: 69 x 69 20 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 20 - receptive field: 85 x 85 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 21 - receptive field: 85 x 85 22 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 22 - receptive field: 85 x 85 23 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 23 - receptive field: 101 x 101 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 24 - receptive field: 101 x 101 25 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 25 - receptive field: 101 x 101 26 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 26 - receptive field: 117 x 117 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 27 - receptive field: 117 x 117 28 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 28 - receptive field: 117 x 117 29 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 29 - receptive field: 133 x 133 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 30 - receptive field: 133 x 133 31 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 31 - receptive field: 133 x 133 32 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 32 - receptive field: 149 x 149 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 33 - receptive field: 149 x 149 34 conv 128 1 x 1/ 1 20 x 20 x 256 -> 20 x 20 x 128 0.026 BF 34 - receptive field: 149 x 149 35 conv 256 3 x 3/ 1 20 x 20 x 128 -> 20 x 20 x 256 0.236 BF 35 - receptive field: 165 x 165 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 20 x 20 x 256 0.000 BF 36 - receptive field: 165 x 165 37 conv 512 3 x 3/ 2 20 x 20 x 256 -> 10 x 10 x 512 0.236 BF 37 - receptive field: 181 x 181 38 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 38 - receptive field: 181 x 181 39 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 39 - receptive field: 213 x 213 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 40 - receptive field: 213 x 213 41 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 41 - receptive field: 213 x 213 42 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 42 - receptive field: 245 x 245 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 43 - receptive field: 245 x 245 44 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 44 - receptive field: 245 x 245 45 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 45 - receptive field: 277 x 277 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 46 - receptive field: 277 x 277 47 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 47 - receptive field: 277 x 277 48 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 48 - receptive field: 309 x 309 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 49 - receptive field: 309 x 309 50 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 50 - receptive field: 309 x 309 51 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 51 - receptive field: 341 x 341 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 52 - receptive field: 341 x 341 53 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 53 - receptive field: 341 x 341 54 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 54 - receptive field: 373 x 373 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 55 - receptive field: 373 x 373 56 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 56 - receptive field: 373 x 373 57 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 57 - receptive field: 405 x 405 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 58 - receptive field: 405 x 405 59 conv 256 1 x 1/ 1 10 x 10 x 512 -> 10 x 10 x 256 0.026 BF 59 - receptive field: 405 x 405 60 conv 512 3 x 3/ 1 10 x 10 x 256 -> 10 x 10 x 512 0.236 BF 60 - receptive field: 437 x 437 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 10 x 10 x 512 0.000 BF 61 - receptive field: 437 x 437 62 conv 1024 3 x 3/ 2 10 x 10 x 512 -> 5 x 5 x1024 0.236 BF 62 - receptive field: 469 x 469 63 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 63 - receptive field: 469 x 469 64 conv 1024 3 x 3/ 1 5 x 5 x 512 -> 5 x 5 x1024 0.236 BF 64 - receptive field: 533 x 533 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 5 x 5 x1024 0.000 BF 65 - receptive field: 533 x 533 66 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 66 - receptive field: 533 x 533 67 conv 1024 3 x 3/ 1 5 x 5 x 512 -> 5 x 5 x1024 0.236 BF 67 - receptive field: 597 x 597 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 5 x 5 x1024 0.000 BF 68 - receptive field: 597 x 597 69 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 69 - receptive field: 597 x 597 70 conv 1024 3 x 3/ 1 5 x 5 x 512 -> 5 x 5 x1024 0.236 BF 70 - receptive field: 661 x 661 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 5 x 5 x1024 0.000 BF 71 - receptive field: 661 x 661 72 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 72 - receptive field: 661 x 661 73 conv 1024 3 x 3/ 1 5 x 5 x 512 -> 5 x 5 x1024 0.236 BF 73 - receptive field: 725 x 725 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 5 x 5 x1024 0.000 BF 74 - receptive field: 725 x 725 75 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 75 - receptive field: 725 x 725 76 conv 1024 3 x 3/ 1 5 x 5 x 512 -> 5 x 5 x1024 0.236 BF 76 - receptive field: 789 x 789 77 conv 512 1 x 1/ 1 5 x 5 x1024 -> 5 x 5 x 512 0.026 BF 77 - receptive field: 789 x 789 78 Try to set subdivisions=64 in your cfg-file. CUDA status Error: file: D:\darknet-master\src\dark_cuda.c : cuda_make_array() : line: 461 : build time: Mar 10 2021 - 17:16:32

CUDA Error: out of memory

I saw many similar error and all soultion in github was "change subdivision=64 in your cfg files'. I set subdivisions=64 in my cfg file(which in ...build\darknet\x64) already. But i still have that error "subdivisions=64 in your cfg-file" I use GTX 750 and CUDA 10.2 and I set width=160 height=160 batch=32(or 64 i tried it both) I don't know what is problem. Is there any more .cfg files that i must modify?? cmd_image

stephanecharette commented 3 years ago

See this question in the FAQ: https://www.ccoderun.ca/programming/darknet_faq/#cuda_out_of_memory

jjy0317 commented 3 years ago

See this question in the FAQ: https://www.ccoderun.ca/programming/darknet_faq/#cuda_out_of_memory

I already tried increasing subdivisons from 1 to 64 in cfg files and it didn't worked. 'Try to set subdivisions=64 in your cfg-file' that message is keep occur.

stephanecharette commented 3 years ago

You didn't read the entire entry if that is the only thing you saw. It also says: "If subdivision=... matches the value in batch=... and you still get an out-of-memory error, then you'll need to decrease the network dimensions (the width=... and height=... in [net]) or select a less demanding configuration."

jjy0317 commented 3 years ago

You didn't read the entire entry if that is the only thing you saw. It also says: "If subdivision=... matches the value in batch=... and you still get an out-of-memory error, then you'll need to decrease the network dimensions (the width=... and height=... in [net]) or select a less demanding configuration."

(I did that resize width, height MANY time before writing.) Sorry but it doesn't changed. I tried width, height =32, 64, 96, 128... tried everything but nothing is change. I set MINIMUM configuration but that 'Try to set subdivisions=64 in your cfg-file.'message won't dissapear Is there any fundamental solution about that message? I really don't know why this message keep show

stephanecharette commented 3 years ago

Yes, it means you don't have enough video memory.

But it doesn't make sense to try with sizes as low as 32. Normally, the minimum people use is 416x416. So you're doing something wrong if it doesn't work with 32, 64, etc.

Have you done as the FAQ suggested and chosen a less demanding configuration?
Which one are you using now?
What have you changed in it?
What video card do you have?
What is the exact command you are running to start training?

jjy0317 commented 3 years ago

I've tried controlling width, height, batch, subdivision. Is there anything else I need to control?

I divided 265 images into 3 classes using yolo v3 and labeled them with yolo-mark. I am currently using the GPU: Geforce GTX 750 CPU: Intel Core i5-9500 CUDA:10.2 Visual Studio 2019 Opencv 3.4.1. Most of what I'm saying now is included in my original text

I used the command darknet.exe detector train data\obj.data yolov3.cfg data\darknet53.conv.74. I also wrote this in the original.

*I trained with 50 pictures last time. Although the accuracy could have been reduced, I don't think such an error occurs because of the lack of data.

stephanecharette commented 3 years ago

I'll say it again, because you keep ignoring me, and then I'll leave you alone: "... or select a less demanding configuration."

If you have a GTX-750, you only have 1 GiB of video ram. You still haven't told us which configuration you are using. If you are using standard YOLOv4 at 608x608, it requires a minimum of 7 GiB which is much more than what you have. This is what I told you above: "it means you don't have enough video memory".

Select a less demanding configuration (such as tiny vs full) or get a new video card.

I strongly suggest you read all the FAQ. But you may be particularly interested in the entry that gives a quick overview of how much video memory is required to train Darknet. https://www.ccoderun.ca/programming/darknet_faq/#memory_consuption

jjy0317 commented 3 years ago

I couldn't perfectly understand what you were saying because my English was not good. Thank you for your advice. I'll try that.

AlexeyAB / darknet

While i train custom model, I have CUDA error #7491