AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.6k stars 7.95k forks source link

CUDA Error Prev: Illegal Memory Access was encountered #4893

Open shapu opened 4 years ago

shapu commented 4 years ago

Hi,

I am training darknet on a custom dataset. The training goes well but suddenly during mAP calculation, I get Illegal Memory Access error. I am suspecting that this is because of GPU running out of memory when calculating mAP. May I know how to set the batch size when calculating the mAP?

In the Makefile, I set nvcc=/usr/local/cuda-10.1/bin/nvcc to avoid convolutional_kernel fail, GPU = 1, OpenCV = 1, CUDNN = 1, rest all the same

Below is cfg file config:

[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=180 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=100 max_batches = 14000 policy=steps steps=5000, 10000 scales=0.1, 0.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

Downsample

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

######################

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear

[yolo] mask = 6,7,8 anchors = 22, 31, 90, 17, 18,110, 50, 56, 51,127, 93, 72, 115,112, 83,203, 155,167 classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 61

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear

[yolo] mask = 3,4,5 anchors = 22, 31, 90, 17, 18,110, 50, 56, 51,127, 93, 72, 115,112, 83,203, 155,167 classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 36

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=21 activation=linear

[yolo] mask = 0,1,2 anchors = 22, 31, 90, 17, 18,110, 50, 56, 51,127, 93, 72, 115,112, 83,203, 155,167 classes=2 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

AlexeyAB commented 4 years ago

Try to train with subdivisions=32 in cfg.

For mAP calculation is used batch=1 subdivisions=1 automatically.

Show output of commands

nvcc --version
nvidia-smi

I get Illegal Memory Access error.

Show screenshot.

In the Makefile, I set nvcc=/usr/local/cuda-10.1/bin/nvcc to avoid convolutional_kernel fail

What is the convolutional_kernel fail ?

shapu commented 4 years ago

nvcc --version output

Screen Shot 2020-02-20 at 8 07 57 AM

nvidia-smi output

Screen Shot 2020-02-20 at 8 11 39 AM

Illegal Memory access error screenshot

Screen Shot 2020-02-20 at 8 14 54 AM

Convolutional_kernel fail? It is one of the modules that is not compiling properly from source files of darknet when doing the make installation. The darknet target compilation fails from make whenever nvcc path is not pointed to the cuda installation. The version for the CudNN is 7.6.4 with the cuda installation. Another point to note: I am using the latest commit of darknet

AlexeyAB commented 4 years ago

In the Makefile, I set nvcc=/usr/local/cuda-10.1/bin/nvcc to avoid convolutional_kernel fail

nvcc --version output

Screen Shot 2020-02-20 at 8 07 57 AM

It seems that you have 2 different CUDA versions 7.5 and 10.1 It seems you compile Darknet by using CUDA 10.1, but you link it to the cudart.so... library from the old CUDA 7.5

Can you re-install CUDA 10.1, during installation press YES for creating symbolic links. Then do https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions Then recompile Darknet.

If it doesn't help - try to remove CUDA 7.5

shapu commented 4 years ago

I reinstalled CUDA 10.1, did the post installation, recompiled darknet and even removed CUDA 7.5. Still it doesn't help. In the Makefile used the same options as before with GPU=1, cudNN=1 and OpenCV = 1 but nvcc is nvcc (not pointed to the cuda-10.1 nvcc). I even changed the subdivisions to 32 and trained but map calculation still does not work.

Below is nvcc --version output Screenshot from 2020-02-20 15-02-08

New error message: Screenshot from 2020-02-20 17-46-24

OpenCV/cudNN versions: Screenshot from 2020-02-20 17-47-59

AlexeyAB commented 4 years ago

@shapu

Do you get this error if you compile Darknet with?

  1. GPU=1 CUDNN=1 CUDNN_HALF=0 OPENCV=1 DEBUG=0

  2. GPU=1 CUDNN=0 CUDNN_HALF=0 OPENCV=1 DEBUG=0

  3. GPU=1 CUDNN=1 CUDNN_HALF=0 OPENCV=0 DEBUG=1


Can you attach zip-archive with: cfg-file, obj.data, obj.names, train.txt, valid.txt files? Or with whole dataset if it is small.

jubaer-ad commented 4 years ago

I compiled darknet succesfully with opencv and cuda computation successfully on windows. I can test with yolov3.weights on images. Up to here everything works fine. But when I try to detect objects on a video, it loads up but after 2/3 seconds, it shows the cura error. The output of detecting on video is: Microsoft Windows [Version 10.0.18362.657] (c) 2019 Microsoft Corporation. All rights reserved.

C:\darknet_AlexeyAB\build\darknet\x64>darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights -ext_output Aao_Raja.mp4 CUDA-version: 10000 (10020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1 OpenCV version: 4.1.2 Demo compute_capability = 750, cudnn_half = 1 net.optimized_memory = 0 batch = 1, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 255 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF 82 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 255 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF 94 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 255 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF 106 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Total BFLOPS 65.879 avg_outputs = 532444 Allocate additional workspace_size = 52.43 MB Loading weights from yolov3.weights... seen 64, trained: 32013 K-images (500 Kilo-batches_64) Done! Loaded 107 layers from weights-file video file: Aao_Raja.mp4 Video stream: 1920 x 1080 Objects:

FPS:0.0 AVG_FPS:0.0 Objects:

person: 35% (left_x: 528 top_y: 567 width: 34 height: 94) person: 32% (left_x: 702 top_y: 569 width: 21 height: 42) person: 31% (left_x: 544 top_y: 282 width: 796 height: 514) person: 28% (left_x: 553 top_y: 535 width: 40 height: 111) person: 42% (left_x: 1142 top_y: 389 width: 25 height: 64) person: 36% (left_x: 344 top_y: 528 width: 24 height: 43)

FPS:0.8 AVG_FPS:0.0 Objects:

person: 25% (left_x: 345 top_y: 530 width: 24 height: 42) person: 34% (left_x: 702 top_y: 569 width: 20 height: 42) person: 32% (left_x: 530 top_y: 569 width: 33 height: 94) person: 31% (left_x: 1145 top_y: 387 width: 25 height: 67) person: 29% (left_x: 1191 top_y: 543 width: 24 height: 52)

FPS:1.6 AVG_FPS:0.0 Objects:

person: 30% (left_x: 1145 top_y: 388 width: 24 height: 66) person: 30% (left_x: 562 top_y: 351 width: 40 height: 112) person: 30% (left_x: 552 top_y: 532 width: 39 height: 118) person: 29% (left_x: 702 top_y: 569 width: 21 height: 44) person: 27% (left_x: 788 top_y: 418 width: 20 height: 46) person: 27% (left_x: 851 top_y: 441 width: 20 height: 51) person: 27% (left_x: 533 top_y: 287 width: 822 height: 512) person: 26% (left_x: 751 top_y: 465 width: 23 height: 47) person: 25% (left_x: 807 top_y: 406 width: 19 height: 46)

FPS:2.3 AVG_FPS:0.0 Objects:

person: 38% (left_x: 705 top_y: 572 width: 20 height: 40) person: 33% (left_x: 542 top_y: 278 width: 815 height: 531) person: 31% (left_x: 1146 top_y: 388 width: 25 height: 66) person: 30% (left_x: 847 top_y: 437 width: 22 height: 56) person: 26% (left_x: 547 top_y: 531 width: 42 height: 120) person: 44% (left_x: 1196 top_y: 544 width: 23 height: 56)

FPS:2.9 AVG_FPS:0.0 Objects:

person: 27% (left_x: 593 top_y: 515 width: 36 height: 110) person: 33% (left_x: 1148 top_y: 387 width: 25 height: 67) person: 28% (left_x: 564 top_y: 287 width: 776 height: 517) person: 28% (left_x: 1196 top_y: 544 width: 25 height: 55) person: 28% (left_x: 847 top_y: 438 width: 21 height: 56)

FPS:3.5 AVG_FPS:0.0 Objects:

person: 26% (left_x: 749 top_y: 466 width: 21 height: 41) person: 37% (left_x: 1294 top_y: 553 width: 45 height: 78) person: 37% (left_x: 1197 top_y: 545 width: 24 height: 52) person: 32% (left_x: 552 top_y: 293 width: 808 height: 502)

FPS:4.0 AVG_FPS:0.0 Objects:

person: 26% (left_x: 812 top_y: 408 width: 18 height: 42) person: 33% (left_x: 1149 top_y: 389 width: 25 height: 65) person: 28% (left_x: 560 top_y: 283 width: 790 height: 519) person: 27% (left_x: 1392 top_y: 609 width: 48 height: 83) person: 26% (left_x: 1295 top_y: 552 width: 44 height: 79)

FPS:4.5 AVG_FPS:0.0 Objects:

person: 39% (left_x: 1150 top_y: 388 width: 25 height: 67) person: 30% (left_x: 557 top_y: 282 width: 785 height: 519) person: 28% (left_x: 620 top_y: 377 width: 27 height: 61)

FPS:4.9 AVG_FPS:0.0 Objects:

person: 45% (left_x: 541 top_y: 281 width: 811 height: 522) person: 41% (left_x: 621 top_y: 377 width: 27 height: 63) person: 40% (left_x: 1150 top_y: 390 width: 25 height: 66)

FPS:5.3 AVG_FPS:0.0 Objects:

person: 32% (left_x: 1152 top_y: 390 width: 26 height: 66) person: 25% (left_x: 621 top_y: 372 width: 27 height: 67) person: 38% (left_x: 551 top_y: 278 width: 801 height: 532)

FPS:5.6 AVG_FPS:0.0 Objects:

person: 41% (left_x: 551 top_y: 279 width: 806 height: 537) person: 36% (left_x: 1152 top_y: 390 width: 27 height: 66) person: 28% (left_x: 620 top_y: 371 width: 27 height: 69)

FPS:6.0 AVG_FPS:0.0 Objects:

person: 47% (left_x: 573 top_y: 357 width: 46 height: 127) person: 38% (left_x: 622 top_y: 371 width: 28 height: 70) person: 28% (left_x: 1154 top_y: 393 width: 26 height: 62)

FPS:6.2 AVG_FPS:0.0 Objects:

person: 57% (left_x: 624 top_y: 373 width: 27 height: 68) person: 35% (left_x: 575 top_y: 360 width: 44 height: 123) person: 29% (left_x: 1211 top_y: 543 width: 26 height: 51) person: 26% (left_x: 1157 top_y: 395 width: 26 height: 59)

FPS:6.5 AVG_FPS:0.0 Objects:

person: 28% (left_x: 1213 top_y: 542 width: 26 height: 55) person: 25% (left_x: 541 top_y: 278 width: 837 height: 535) person: 57% (left_x: 625 top_y: 371 width: 26 height: 67)

FPS:6.7 AVG_FPS:0.0 Objects:

person: 25% (left_x: 1159 top_y: 397 width: 27 height: 59) person: 58% (left_x: 625 top_y: 371 width: 25 height: 66) person: 37% (left_x: 1214 top_y: 543 width: 26 height: 57) person: 32% (left_x: 545 top_y: 284 width: 830 height: 518)

FPS:6.9 AVG_FPS:0.0 Objects:

person: 28% (left_x: 1215 top_y: 543 width: 25 height: 57) person: 61% (left_x: 625 top_y: 373 width: 26 height: 68) person: 40% (left_x: 578 top_y: 364 width: 41 height: 117) person: 37% (left_x: 523 top_y: 291 width: 862 height: 507)

FPS:7.1 AVG_FPS:0.0 Objects:

person: 62% (left_x: 625 top_y: 373 width: 27 height: 69) person: 43% (left_x: 550 top_y: 278 width: 822 height: 532) person: 33% (left_x: 577 top_y: 362 width: 41 height: 120) person: 28% (left_x: 1214 top_y: 542 width: 28 height: 58)

FPS:7.3 AVG_FPS:0.0 Objects:

person: 55% (left_x: 627 top_y: 371 width: 27 height: 68) person: 39% (left_x: 537 top_y: 285 width: 838 height: 523) person: 29% (left_x: 1161 top_y: 400 width: 27 height: 57) person: 27% (left_x: 578 top_y: 359 width: 40 height: 125)

FPS:7.4 AVG_FPS:0.0 Objects:

person: 42% (left_x: 579 top_y: 356 width: 40 height: 129) person: 34% (left_x: 553 top_y: 283 width: 809 height: 528) person: 30% (left_x: 1161 top_y: 397 width: 29 height: 59) person: 27% (left_x: 1270 top_y: 501 width: 26 height: 58) person: 60% (left_x: 627 top_y: 370 width: 27 height: 71)

FPS:7.6 AVG_FPS:0.0 Objects:

person: 53% (left_x: 628 top_y: 372 width: 26 height: 71) person: 51% (left_x: 580 top_y: 360 width: 39 height: 123) person: 38% (left_x: 537 top_y: 287 width: 849 height: 521)

FPS:7.7 AVG_FPS:0.0 Objects:

person: 61% (left_x: 629 top_y: 372 width: 27 height: 72) person: 48% (left_x: 580 top_y: 360 width: 40 height: 124) person: 42% (left_x: 548 top_y: 269 width: 819 height: 554)

FPS:7.8 AVG_FPS:0.0 Objects:

person: 57% (left_x: 629 top_y: 372 width: 26 height: 68) person: 45% (left_x: 531 top_y: 271 width: 865 height: 555)

FPS:7.9 AVG_FPS:0.0 Objects:

person: 37% (left_x: 539 top_y: 271 width: 849 height: 552) person: 33% (left_x: 1430 top_y: 610 width: 48 height: 86) person: 54% (left_x: 631 top_y: 372 width: 25 height: 66)

FPS:8.0 AVG_FPS:0.0 Objects:

person: 31% (left_x: 1429 top_y: 612 width: 49 height: 84) person: 30% (left_x: 564 top_y: 567 width: 35 height: 99) person: 53% (left_x: 631 top_y: 373 width: 24 height: 67) person: 38% (left_x: 543 top_y: 273 width: 835 height: 548)

FPS:8.1 AVG_FPS:0.0 Objects:

person: 55% (left_x: 632 top_y: 373 width: 25 height: 69) person: 51% (left_x: 542 top_y: 260 width: 834 height: 577)

FPS:8.2 AVG_FPS:0.0 Objects:

person: 55% (left_x: 632 top_y: 373 width: 25 height: 69) CUDA status Error: file: C:\darknet_AlexeyAB\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Mar 3 2020 - 17:59:27 person: 51% CUDA Error: an illegal memory access was encountered (left_x: 542 top_y: 260 width: 834 height: 577)

FPS:10.3 AVG_FPS:0.0

It's not always the illegal memory access. Sometimes unspecified launch failure too Any help?

AlexeyAB commented 4 years ago

@jubaer-ad

What GPU Do you use? Do you get this error if you compile Darknet with CUDNN_HALF=0? Do you get this error if you compile Darknet with CUDNN=0?

jubaer-ad commented 4 years ago

@jubaer-ad

What GPU Do you use? Do you get this error if you compile Darknet with CUDNN_HALF=0? Do you get this error if you compile Darknet with CUDNN=0?

I have a GeForce GTX 1660 VENTUS XS 6G OC - MSI. I compiled with CUDNN_HALF=0 but the problem is same. I haven't compiled with CUDNN=0. Will it support gpu accelerated performance with CUDNN=0? @AlexeyAB

jubaer-ad commented 4 years ago

Btw, I used CMAKE. In CUDNN section, there is no option for uncheck. Should I delete the path to CUDNN?

AlexeyAB commented 4 years ago

@jubaer-ad Set grouped and advanced checkboxes at the top of Cmake-GUI. Then go to ENABLE image

jubaer-ad commented 4 years ago

Compiled without CUDNN and CUDNN_HALF enabled. But same... ...................................................... Microsoft Windows [Version 10.0.18362.657] (c) 2019 Microsoft Corporation. All rights reserved.

C:\darknet_AlexeyAB\build\darknet\x64>darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights -ext_output test.mp4 CUDA-version: 10000 (10020), cuDNN: 7.6.5, GPU count: 1 OpenCV version: 4.1.2 Demo net.optimized_memory = 0 batch = 1, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 255 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF 82 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 255 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF 94 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 255 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF 106 yolo [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Total BFLOPS 65.879 avg_outputs = 532444 Allocate additional workspace_size = 52.43 MB Loading weights from yolov3.weights... seen 64, trained: 32013 K-images (500 Kilo-batches_64) Done! Loaded 107 layers from weights-file video file: test.mp4 Video stream: 1920 x 1080 Objects:

FPS:0.0 AVG_FPS:0.0 Objects:

person: 35% (left_x: 528 top_y: 567 width: 34 height: 94) person: 32% (left_x: 702 top_y: 569 width: 21 height: 42) person: 31% (left_x: 544 top_y: 282 width: 796 height: 514) person: 28% (left_x: 553 top_y: 535 width: 40 height: 111) person: 42% (left_x: 1142 top_y: 389 width: 25 height: 64) person: 36% (left_x: 344 top_y: 528 width: 24 height: 43)

FPS:2.7 AVG_FPS:0.0 Objects:

person: 25% (left_x: 345 top_y: 530 width: 24 height: 42) person: 34% (left_x: 702 top_y: 569 width: 20 height: 42) person: 32% (left_x: 530 top_y: 569 width: 33 height: 94) person: 31% (left_x: 1145 top_y: 387 width: 25 height: 67) person: 30% (left_x: 1191 top_y: 543 width: 24 height: 52)

FPS:5.0 AVG_FPS:0.0 Objects:

person: 30% (left_x: 1145 top_y: 388 width: 24 height: 66) person: 30% (left_x: 562 top_y: 351 width: 40 height: 112) person: 30% (left_x: 552 top_y: 532 width: 39 height: 118) person: 29% (left_x: 702 top_y: 569 width: 21 height: 44) person: 27% (left_x: 788 top_y: 418 width: 20 height: 46) person: 27% (left_x: 851 top_y: 441 width: 20 height: 51) person: 27% (left_x: 533 top_y: 287 width: 822 height: 512) person: 26% (left_x: 751 top_y: 465 width: 23 height: 47) person: 25% (left_x: 807 top_y: 406 width: 19 height: 46)

FPS:7.1 AVG_FPS:0.0 Objects:

person: 38% (left_x: 705 top_y: 572 width: 20 height: 40) person: 33% (left_x: 542 top_y: 278 width: 815 height: 531) person: 31% (left_x: 1146 top_y: 388 width: 25 height: 66) person: 30% (left_x: 847 top_y: 437 width: 22 height: 56) person: 26% (left_x: 547 top_y: 531 width: 42 height: 120) person: 44% (left_x: 1196 top_y: 544 width: 23 height: 56)

FPS:8.9 AVG_FPS:0.0 Objects:

person: 27% (left_x: 593 top_y: 515 width: 36 height: 110) person: 33% (left_x: 1148 top_y: 387 width: 25 height: 67) person: 28% (left_x: 564 top_y: 287 width: 776 height: 517) person: 28% (left_x: 1196 top_y: 544 width: 25 height: 55) person: 28% (left_x: 847 top_y: 438 width: 21 height: 56)

FPS:10.6 AVG_FPS:0.0 Objects:

person: 26% (left_x: 749 top_y: 466 width: 21 height: 41) person: 37% (left_x: 1294 top_y: 553 width: 45 height: 78) person: 37% (left_x: 1197 top_y: 545 width: 24 height: 52) person: 32% (left_x: 552 top_y: 293 width: 808 height: 502)

FPS:12.2 AVG_FPS:0.0 Objects:

person: 26% (left_x: 812 top_y: 408 width: 18 height: 42) person: 33% (left_x: 1149 top_y: 389 width: 25 height: 65) person: 28% (left_x: 560 top_y: 283 width: 790 height: 519) person: 27% (left_x: 1392 top_y: 609 width: 48 height: 83) person: 26% (left_x: 1295 top_y: 552 width: 44 height: 79)

FPS:13.8 AVG_FPS:0.0 Objects:

person: 39% (left_x: 1150 top_y: 388 width: 25 height: 67) person: 30% (left_x: 557 top_y: 282 width: 785 height: 519) person: 28% (left_x: 620 top_y: 377 width: 27 height: 61)

FPS:15.3 AVG_FPS:0.0 Objects:

person: 45% (left_x: 541 top_y: 281 width: 811 height: 523) person: 41% (left_x: 621 top_y: 377 width: 27 height: 63) person: 40% (left_x: 1150 top_y: 390 width: 25 height: 66)

FPS:16.7 AVG_FPS:0.0 Objects:

person: 32% (left_x: 1152 top_y: 390 width: 26 height: 66) person: 25% (left_x: 621 top_y: 372 width: 27 height: 67) person: 38% (left_x: 551 top_y: 278 width: 801 height: 532)

FPS:18.0 AVG_FPS:0.0 Objects:

person: 41% (left_x: 551 top_y: 279 width: 806 height: 537) person: 36% (left_x: 1152 top_y: 390 width: 27 height: 66) person: 28% (left_x: 620 top_y: 371 width: 27 height: 69)

FPS:19.1 AVG_FPS:0.0 Objects:

person: 41% (left_x: 551 top_y: 279 width: 806 height: 537) CUDA status Error: file: C:\darknet_AlexeyAB\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Mar 4 2020 - 19:03:23 person: 36% CUDA Error: unspecified launch failure (left_x: 1152 top_y: 390 width: 27 height: 66) person: 28% (left_x: 620 top_y: 371 width: 27 height: 69)

FPS:19.9 AVG_FPS:0.0

AlexeyAB commented 4 years ago

@jubaer-ad Try to run detection with -benchmark_layers at the end of training command and show the output of the error.

jubaer-ad commented 4 years ago

@AlexeyAB The last part of output is like: Sorted by time (forward): 0 - fw-sort-layer 62 - type: 0 - avg_time 1.416391 ms 1 - fw-sort-layer 0 - type: 0 - avg_time 1.281484 ms 2 - fw-sort-layer 76 - type: 0 - avg_time 1.021855 ms 3 - fw-sort-layer 67 - type: 0 - avg_time 1.017897 ms 4 - fw-sort-layer 64 - type: 0 - avg_time 1.007820 ms 5 - fw-sort-layer 70 - type: 0 - avg_time 0.983453 ms 6 - fw-sort-layer 80 - type: 0 - avg_time 0.982425 ms 7 - fw-sort-layer 78 - type: 0 - avg_time 0.977486 ms 8 - fw-sort-layer 73 - type: 0 - avg_time 0.962228 ms 9 - fw-sort-layer 1 - type: 0 - avg_time 0.953530 ms 10 - fw-sort-layer 37 - type: 0 - avg_time 0.890899 ms 11 - fw-sort-layer 5 - type: 0 - avg_time 0.881827 ms 12 - fw-sort-layer 3 - type: 0 - avg_time 0.881560 ms 13 - fw-sort-layer 12 - type: 0 - avg_time 0.708670 ms 14 - fw-sort-layer 90 - type: 0 - avg_time 0.697868 ms 15 - fw-sort-layer 39 - type: 0 - avg_time 0.677364 ms 16 - fw-sort-layer 42 - type: 0 - avg_time 0.676012 ms 17 - fw-sort-layer 45 - type: 0 - avg_time 0.671427 ms 18 - fw-sort-layer 48 - type: 0 - avg_time 0.665705 ms 19 - fw-sort-layer 54 - type: 0 - avg_time 0.660772 ms 20 - fw-sort-layer 10 - type: 0 - avg_time 0.650735 ms 21 - fw-sort-layer 51 - type: 0 - avg_time 0.650715 ms 22 - fw-sort-layer 20 - type: 0 - avg_time 0.650690 ms 23 - fw-sort-layer 57 - type: 0 - avg_time 0.639786 ms 24 - fw-sort-layer 88 - type: 0 - avg_time 0.638265 ms 25 - fw-sort-layer 92 - type: 0 - avg_time 0.631676 ms 26 - fw-sort-layer 60 - type: 0 - avg_time 0.626345 ms 27 - fw-sort-layer 7 - type: 0 - avg_time 0.622629 ms 28 - fw-sort-layer 23 - type: 0 - avg_time 0.619349 ms 29 - fw-sort-layer 26 - type: 0 - avg_time 0.615634 ms 30 - fw-sort-layer 17 - type: 0 - avg_time 0.612639 ms 31 - fw-sort-layer 29 - type: 0 - avg_time 0.606422 ms 32 - fw-sort-layer 104 - type: 0 - avg_time 0.585558 ms 33 - fw-sort-layer 35 - type: 0 - avg_time 0.584046 ms 34 - fw-sort-layer 32 - type: 0 - avg_time 0.582044 ms 35 - fw-sort-layer 14 - type: 0 - avg_time 0.578634 ms 36 - fw-sort-layer 100 - type: 0 - avg_time 0.551685 ms 37 - fw-sort-layer 102 - type: 0 - avg_time 0.543777 ms 38 - fw-sort-layer 2 - type: 0 - avg_time 0.498485 ms 39 - fw-sort-layer 77 - type: 0 - avg_time 0.459195 ms 40 - fw-sort-layer 66 - type: 0 - avg_time 0.428734 ms 41 - fw-sort-layer 79 - type: 0 - avg_time 0.410092 ms 42 - fw-sort-layer 106 - type: 27 - avg_time 0.406384 ms 43 - fw-sort-layer 44 - type: 0 - avg_time 0.389605 ms 44 - fw-sort-layer 75 - type: 0 - avg_time 0.369129 ms 45 - fw-sort-layer 38 - type: 0 - avg_time 0.367781 ms 46 - fw-sort-layer 69 - type: 0 - avg_time 0.367208 ms 47 - fw-sort-layer 72 - type: 0 - avg_time 0.366015 ms 48 - fw-sort-layer 63 - type: 0 - avg_time 0.361490 ms 49 - fw-sort-layer 81 - type: 0 - avg_time 0.346167 ms 50 - fw-sort-layer 4 - type: 14 - avg_time 0.345134 ms 51 - fw-sort-layer 99 - type: 0 - avg_time 0.341736 ms 52 - fw-sort-layer 6 - type: 0 - avg_time 0.338690 ms 53 - fw-sort-layer 9 - type: 0 - avg_time 0.330784 ms 54 - fw-sort-layer 19 - type: 0 - avg_time 0.324594 ms 55 - fw-sort-layer 50 - type: 0 - avg_time 0.324054 ms 56 - fw-sort-layer 41 - type: 0 - avg_time 0.315546 ms 57 - fw-sort-layer 47 - type: 0 - avg_time 0.315293 ms 58 - fw-sort-layer 56 - type: 0 - avg_time 0.313170 ms 59 - fw-sort-layer 84 - type: 0 - avg_time 0.313025 ms 60 - fw-sort-layer 36 - type: 14 - avg_time 0.309531 ms 61 - fw-sort-layer 25 - type: 0 - avg_time 0.308219 ms 62 - fw-sort-layer 22 - type: 0 - avg_time 0.307038 ms 63 - fw-sort-layer 87 - type: 0 - avg_time 0.305605 ms 64 - fw-sort-layer 93 - type: 0 - avg_time 0.303816 ms 65 - fw-sort-layer 53 - type: 0 - avg_time 0.302254 ms 66 - fw-sort-layer 16 - type: 0 - avg_time 0.298365 ms 67 - fw-sort-layer 59 - type: 0 - avg_time 0.293113 ms 68 - fw-sort-layer 89 - type: 0 - avg_time 0.289184 ms 69 - fw-sort-layer 91 - type: 0 - avg_time 0.286977 ms 70 - fw-sort-layer 8 - type: 14 - avg_time 0.284977 ms 71 - fw-sort-layer 28 - type: 0 - avg_time 0.277581 ms 72 - fw-sort-layer 21 - type: 14 - avg_time 0.275957 ms 73 - fw-sort-layer 34 - type: 0 - avg_time 0.265510 ms 74 - fw-sort-layer 96 - type: 0 - avg_time 0.265437 ms 75 - fw-sort-layer 11 - type: 14 - avg_time 0.261504 ms 76 - fw-sort-layer 24 - type: 14 - avg_time 0.258428 ms 77 - fw-sort-layer 27 - type: 14 - avg_time 0.253498 ms 78 - fw-sort-layer 31 - type: 0 - avg_time 0.248769 ms 79 - fw-sort-layer 105 - type: 0 - avg_time 0.248354 ms 80 - fw-sort-layer 18 - type: 14 - avg_time 0.245829 ms 81 - fw-sort-layer 30 - type: 14 - avg_time 0.235177 ms 82 - fw-sort-layer 103 - type: 0 - avg_time 0.230286 ms 83 - fw-sort-layer 13 - type: 0 - avg_time 0.225427 ms 84 - fw-sort-layer 15 - type: 14 - avg_time 0.224127 ms 85 - fw-sort-layer 33 - type: 14 - avg_time 0.218803 ms 86 - fw-sort-layer 40 - type: 14 - avg_time 0.209556 ms 87 - fw-sort-layer 101 - type: 0 - avg_time 0.199082 ms 88 - fw-sort-layer 43 - type: 14 - avg_time 0.195723 ms 89 - fw-sort-layer 71 - type: 14 - avg_time 0.193048 ms 90 - fw-sort-layer 94 - type: 27 - avg_time 0.190541 ms 91 - fw-sort-layer 58 - type: 14 - avg_time 0.175323 ms 92 - fw-sort-layer 46 - type: 14 - avg_time 0.175033 ms 93 - fw-sort-layer 83 - type: 9 - avg_time 0.173526 ms 94 - fw-sort-layer 49 - type: 14 - avg_time 0.172208 ms 95 - fw-sort-layer 98 - type: 9 - avg_time 0.169994 ms 96 - fw-sort-layer 74 - type: 14 - avg_time 0.168186 ms 97 - fw-sort-layer 52 - type: 14 - avg_time 0.167782 ms 98 - fw-sort-layer 55 - type: 14 - avg_time 0.159863 ms 99 - fw-sort-layer 65 - type: 14 - avg_time 0.159021 ms 100 - fw-sort-layer 85 - type: 32 - avg_time 0.156629 ms 101 - fw-sort-layer 86 - type: 9 - avg_time 0.149874 ms 102 - fw-sort-layer 97 - type: 32 - avg_time 0.149298 ms 103 - fw-sort-layer 61 - type: 14 - avg_time 0.147601 ms 104 - fw-sort-layer 68 - type: 14 - avg_time 0.146559 ms 105 - fw-sort-layer 95 - type: 9 - avg_time 0.142302 ms 106 - fw-sort-layer 82 - type: 27 - avg_time 0.141730 ms Objects:

person: 35% (left_x: 61 top_y: 140 width: 1449 height: 796)

FPS:13.0 AVG_FPS:12.8 CUDA status Error: file: C:/darknet_AlexeyAB/src/network_kernels.cu : forward_network_gpu() : line: 90 : build time: Mar 4 2020 - 19:03:18

CUDA Error: an illegal memory access was encountered

AlexeyAB commented 4 years ago

@jubaer-ad Thanks!

Can you change these two lines: https://github.com/AlexeyAB/darknet/blob/b2fc7b624c29b44a92b3ff4980b0685f062a5f19/Makefile#L100-L101 to these two lines

COMMON+= -DCUDA_DEBUG -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DCUDA_DEBUG -DGPU

Then recompile:

make clean
make

and run detection again without flag -benchmark_layers

And show output.

jubaer-ad commented 4 years ago

@AlexeyAB I am on windows. I didn't use the make command to build like on linux. Do the changing to makefile and recompiling with cmake make any sense?

AlexeyAB commented 4 years ago

@jubaer-ad

open darknet.sln in MSVS -> (right click on project) -> properties -> C/C++ -> Preprocessor -> Preprocessor Definitions

Add CUDA_DEBUG; at the begining of line, as there - then recompile: image

jubaer-ad commented 4 years ago

@AlexeyAB Built without cudnn_half and with CUDA_DEBUG; in preprocessor definition. The problem remains. Output: FPS:12.6 AVG_FPS:0.0 Objects:

person: 57% (left_x: 629 top_y: 372 width: 26 height: 68) person: 45% (left_x: 531 top_y: 271 width: 865 height: 555)

FPS:12.8 AVG_FPS:0.0

cuDNN status = cudaDeviceSynchronize() Error in: file: C:/darknet_AlexeyAB/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 5 2020 - 23:17:59

cuDNN Error: CUDNN_UNKNOWN_STATUS

AlexeyAB commented 4 years ago

@jubaer-ad

Ok, so at least we know that the error is in this function: https://github.com/AlexeyAB/darknet/blob/b2fc7b624c29b44a92b3ff4980b0685f062a5f19/src/convolutional_kernels.cu#L532-L544

Can you build without CUDNN and CUDNN_HALF, but with CUDA_DEBUG; and show an error.

jubaer-ad commented 4 years ago

@AlexeyAB I am adding CUDA_DEBUG; to darknet in msvs. e1 I have no input to Preprocessor Definitions like opencv, cudnn like you have in the previous picture of yours. Instead, I have: e2 And result of error is: FPS:14.3 AVG_FPS:14.6 Objects:

bowl: 28% (left_x: 7 top_y: 857 width: 224 height: 95) person: 49% (left_x: 456 top_y: 281 width: 969 height: 653)

FPS:14.4 AVG_FPS:14.6

cuDNN status = cudaDeviceSynchronize() Error in: file: C:/darknet_AlexeyAB/src/convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 544 : build time: Mar 6 2020 - 00:54:32

cuDNN Error: CUDNN_UNKNOWN_STATUS

AlexeyAB commented 4 years ago

@jubaer-ad Try to remove CUDNN; from preprocessor definition image

jubaer-ad commented 4 years ago

@AlexeyAB Sorry for late reply. After removing CUDNN, facing error when building solution file. Output: e2 Error list: e1

AlexeyAB commented 4 years ago

@jubaer-ad

Ok, Un-check CUDNN_HALF and CUDNN there in Cmake-GUI (look at image), press Generate -> OpenProject -> Add CUDA_DEBUG; as there https://github.com/AlexeyAB/darknet/issues/4893#issuecomment-595223624 and Rebuld, then run Detection image

jubaer-ad commented 4 years ago

@AlexeyAB I already unchecked the CUDNN_HALF and CUDNN in cmake, I reconfigured and generated. Then added CUDA_DEBUG; to Preprocessor Definitions and tried to build. Then I got that error.

AlexeyAB commented 4 years ago

Try to compile the latest version of Darknet as usual. And run Darknet with flag -cuda_debug_sync

KimalIsaev commented 4 years ago

Hi, @AlexeyAB If I run with flag -cuda_debug_sync Enter Image Path: CUDA status = cudaDeviceSynchronize() Error: file: C:/project/darknet-master/src/blas_kernels.cu : add_bias_gpu() : line: 103 : build time: Mar 18 2020 - 05:42:29

CUDA Error: an illegal memory access was encountered CUDA Error: an illegal memory access was encountered: No error

2020-03-18_06-04-35

I use: Windows 10 GTX 1080Ti latest version of darknet

AlexeyAB commented 4 years ago

@KimalIsaev

  1. Try to run with flags -cuda_debug_sync -benchmark_layers and show an error.

  2. Also attach your cfg-file in zip archive or in txt-file.

  3. Rename this file to nvidia-smi.cmd run and show output nvidia-smi .cmd.txt

  4. Show such screenshot: image

KimalIsaev commented 4 years ago

@AlexeyAB By the way, I had they error occuring only after two month using darknet. Before that it work like a charm.

Cfg-file, KmalIsaevErrorCfg.zip

2020-03-19_01-16-55

2020-03-19_02-26-53

AlexeyAB commented 4 years ago

@KimalIsaev Do you get an error when train with flags -cuda_debug_sync -benchmark_layers ?

By the way, I had they error occuring only after two month using darknet. Before that it work like a charm.

Do you get an error only with new version of Darknet?


[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=18 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=1 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

Should be

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=18 activation=linear

[yolo] mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=1 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

KimalIsaev commented 4 years ago

@AlexeyAB

0)

Do you get an error when train with flags -cuda_debug_sync -benchmark_layers ?

No error whith command detector test for 6000 images.

1)

Do you get an error only with new version of Darknet?

No, with old one I have error too, i have checked that, in particular with version ecf1aeb7e70092a1b3603c9cf2a09fc7d8277d69

2)

Should be

It should be for efficiency reasons or no-bugs reasons? I ask because I already have a trained with this cfg file.

KimalIsaev commented 4 years ago

@AlexeyAB

Do you get an error when train with flags -cuda_debug_sync -benchmark_layers ?

No errors, but it is really slow.

AlexeyAB commented 4 years ago

@KimalIsaev Try to run with only 1 flag -cuda_debug_sync instead of -cuda_debug_sync -benchmark_layers

duanxingjian commented 4 years ago

I reinstalled CUDA 10.1, did the post installation, recompiled darknet and even removed CUDA 7.5. Still it doesn't help. In the Makefile used the same options as before with GPU=1, cudNN=1 and OpenCV = 1 but nvcc is nvcc (not pointed to the cuda-10.1 nvcc). I even changed the subdivisions to 32 and trained but map calculation still does not work.

Below is nvcc --version output Screenshot from 2020-02-20 15-02-08

New error message: Screenshot from 2020-02-20 17-46-24

OpenCV/cudNN versions: Screenshot from 2020-02-20 17-47-59

@shapu Hi there, have you found the root cause for your CUDA "illegal memory access" error during mAP calculation? I'm having exact same problem here. I'm using CUDA 10.0 cuDNN 7.6.5 with OPENCV=1 Thanks! image

AlexeyAB commented 4 years ago

Try to use cuDN 7.4.2 What GPU do you use? Do you train yolov4-custom.cfg?

duanxingjian commented 4 years ago

Try to use cuDN 7.4.2 What GPU do you use? Do you train yolov4-custom.cfg? @AlexeyAB Thanks! will try cuDNN 7.4.2 meanwhile, I'm training yolov3-tiny, the training process itself was flawless. GPU I'm using is Tesla K80 (from Azure instance) my_yolov3_tiny.cfg.txt image

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130

AlexeyAB commented 4 years ago

Almost all problems with mAP calculation during training are related to the old K80. It is still unknown how to solve this.

duanxingjian commented 4 years ago

Almost all problems with mAP calculation during training are related to the old K80. It is still unknown how to solve this.

Thanks for letting me know! yeah, mAP is no big deal, just it's very nice to have it plotted on the loss chart.