AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
21.66k stars 7.96k forks source link

status Error dark_cuda.c : cuda_push_array() : Line 458 #4657

Open cym0301 opened 4 years ago

cym0301 commented 4 years ago

Hi everyone,

I am a beginner of object detection and currently I am trying out csresnext50-panet-spp. I started the training with the command "darknet.exe detector train data/ cfg/innoiris.cfg csresnext50-panet-spp.conv.112 -map" and the configuration file attached. During training, the error shown in the screenshot occurred. May I know if it is caused by my wrong configuration or other hardware issues (I am using OpenCV 4.2 with CUDA 10.2 and cuDNN, as well as one GTX1080, for training.)? I am not using the latest version of darknet but 6878ecc instead.



AlexeyAB commented 4 years ago


So also you can try to disable CUDNN and CUDNN_HALF in Cmake, then press Generate -> OpenProject -> Recompile and train, what error will you get?


KimalIsaev commented 4 years ago


Try to download new Darknet and do the same - train without CUDA_DEBUG and with flag -benchmark_layers for more than 1000 iterations, and show screenshot of error.

After 10000 iterations no error.

mwindowshz commented 4 years ago

Hi Have the same problem of cuda_push_array() error line 457

tried using -show_imgs and received these images


Is this normal? Also using -show_images needs manual keyboard intervening.

I am using mscoco with csresnext50-panet-spp-original-optimal.cfg

command line darknet detector train data/ csresnext50-panet-spp-original-optimal.cfg darknet53.conv.74 -show_imgs

running on windows 10 GeForce 1080ti



Compiled using CMake VS2019 Latest Darknet version.

If you can please help to clarify. Thanks

Also would like to ask for MSCOCO max_batches is 500200 does it realy need so many iterations when using darknet53.conv.74 ? this can take very long.

mwindowshz commented 4 years ago

Hi ,It's ok when training yolov3.cfg on pascal voc,while not on the csresnext50-panet-spp.cfg,so could you share your successful training experience on csresnext50-panet-spp.cfg including hardware info, environment configure,training process and so on, or where is your last successful training commit code on csresnext50-panet-spp.cfg, looking forward to your reply

Hi @MrCuiHao did you solve the problem for csresnext50-panet-spp.cfg

KimalIsaev commented 4 years ago

@mwindowshz Did you tried training with flag -benchmark_layers?

mwindowshz commented 4 years ago

@Kimallsaev Yes I did try but then it seemed to be slower process, also gpu load is 20-60% instead of being up n the 80-90% . So it seemed like it would take so much more time to train. and there are 500200 iterations set in the cfg file. But Ok, I am trying again now. got to 70 iterations no crash, would update

Also I am trying to train yolov3.cfg this works, but my graph looks like this: is this ok? chart_yolov3


KimalIsaev commented 4 years ago

@mwindowshz It's slower, but for now it's only solution. Or find bug and change source code.

mwindowshz commented 4 years ago

Ok, interesting what version of darknet was used with csresnext50-panet-spp.cfg before the bug.

Using -benchmark_layers is very very very slow, working for almost 20 hours and only at 3500 iterations on 1080ti

And about the learning graph with yolov3.cfg that I posted above, is this normal, does anyone have experience with this?


AlexeyAB commented 4 years ago


Try to disable CUDNN and CUDNN_HALF in Cmake, then press Generate -> OpenProject -> Recompile and train without -benchmark_layers, what error will you get?


And about the learning graph with yolov3.cfg that I posted above, is this normal, does anyone have experience with this?

This is normal.

mwindowshz commented 4 years ago


Still error

CUDA status Error: file: C:\Users\protrack\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Mar 12 2020 - 11:45:30
CUDA Error: unspecified launch failure

also in each yolo layer there is this message

 137 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Unused field: 'uc_normalizer = 0.07'
Unused field: 'beta1 = 0.6'

I don't have Enable Zed Camera, do I need to download something or this is just an another avilable option

These are the setting used:


AlexeyAB commented 4 years ago


Still error

CUDA status Error: file: C:\Users\protrack\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Mar 12 2020 - 11:45:30 CUDA Error: unspecified launch failure

Do you get this error with -benchmark_layer flag?

What error do you get with -benchmark_layer flag?

Also try to download the latest Darknet version and try to run:

and show all errors

mwindowshz commented 4 years ago

Hi running regular train has an error:

CUDA status Error: file: C:\Users\...\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 469 : build time: Mar 18 2020 - 13:03:04

 CUDA Error: unspecified launch failure

CUDA Error: unspecified launch failure: No error
Assertion failed: 0, file C:\Users\...\source\repos\darknet\src\utils.c, line 325

Using -benchmark_layer There is no error but training is very very slow.

Using -cuda_debug_sync

had no error! it seems slower

Also when loading the cfg file there are these comments on the yolo layer that some variables are not being used , why? because this version of darknet does not support them?

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Unused field: 'uc_normalizer = 0.07'
Unused field: 'beta1 = 0.6'
AlexeyAB commented 4 years ago


CUDA status Error: file: C:\Users...\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 469 : build time: Mar 18 2020 - 13:03:04

CUDA Error: unspecified launch failure

CUDA Error: unspecified launch failure: No error Assertion failed: 0, file C:\Users...\source\repos\darknet\src\utils.c, line 325

Unused field: 'uc_normalizer = 0.07'

Is only for Gaussian-yolo

Unused field: 'beta1 = 0.6'

Just isn't required at all.

mwindowshz commented 4 years ago

Hi I have Cuda 10.2

D:\Training\mscoco>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89

cudnn 7.6.5

visual studio 2019

CUDNN 7.4.2 is not compatible with this version.

I compiled with CUDNN_HALF, CUDNN

tryied to compile without CUDNN_HALF and without CUDNN not crashing but learning is NAN

 (next mAP calculation at 7329 iterations)
 10: nan, nan avg loss, 0.000000 rate, 31.047000 seconds, 640 images
Resizing, random_coef = 1.40

one line example: v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 137 Avg (IOU: nan, GIOU: nan), Class: nan, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 12, class_loss = 12.000000, iou_loss = 0.000000, total_loss = 12.000000

removing only CUDNN_HALF resulted with error

CUDA status Error: file: C:\Users\....\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 469 : build time: Mar 23 2020 - 11:46:32

 CUDA Error: unspecified launch failure

CUDA Error: unspecified launch failure: No error
Assertion failed: 0, file C:\Users\...\source\repos\darknet\src\utils.c, line 325
AlexeyAB commented 4 years ago

removing only CUDNN_HALF resulted with error

CUDA status Error: file: C:\Users....\source\repos\darknet\src\dark_cuda.c : cuda_push_array() : line: 469 : build time: Mar 23 2020 - 11:46:32

CUDA Error: unspecified launch failure

CUDA Error: unspecified launch failure: No error Assertion failed: 0, file C:\Users...\source\repos\darknet\src\utils.c, line 325

Can you show error with previous message? Did you train with -benchmark_layer -cuda_debug_sync flags?

mwindowshz commented 4 years ago

Hi did not understand Using normal compile with CUDNN and CUDNN_HALF, and using flags -benchmark_layer -cuda_debug_sync separately worked there was not crash, but training is very slow, so I did not complete training. should the flags be used together ?

AlexeyAB commented 4 years ago

should the flags be used together ?

Only for debugging to catch the error place. I can't reproduce your error.

mwindowshz commented 4 years ago

Hi Uploaded dump file of the crash can this help dump Thanks

VisionEp1 commented 4 years ago

hi alexy i have the same issue is there any update on this?