AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.58k stars 7.95k forks source link

"cuDNN Error: CUDNN_STATUS_BAD_PARAM: Success" while training yolov4-tiny #8766

Open lordwildbeast opened 1 year ago

lordwildbeast commented 1 year ago

Facing this issue while training yolov4-tiny i use yolov4-tiny pre-trained weights file yolov4-tiny.conv.29 from here my env: Google Colab NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 Tesla T4

Tensor Cores are disabled until the first 3000 iterations are reached.
 (next mAP calculation at 1000 iterations) 
 1000: 0.225544, 1.077304 avg loss, 0.001000 rate, 5.281188 seconds, 64000 images, 6.803829 hours left
Resizing to initial size: 416 x 416  try to allocate additional workspace_size = 150.99 MB 
 CUDA allocate done! 

 calculation mAP (mean average precision)...
 Detection layer: 139 - type = 28 
 Detection layer: 150 - type = 28 
 Detection layer: 161 - type = 28 
4
 cuDNN status Error in: file: ./src/convolutional_kernels.cu : () : line: 543 : build time: Apr  4 2023 - 15:54:26 

 cuDNN Error: CUDNN_STATUS_BAD_PARAM
Darknet error location: ./src/dark_cuda.c, cudnn_check_error, line #204
cuDNN Error: CUDNN_STATUS_BAD_PARAM: Success

Curious to know if anyone else has a same problem, or have an idea as to how to fix this problem. many thanks

stephanecharette commented 1 year ago

At iteration 1000? See: https://github.com/AlexeyAB/darknet/issues/8669

lordwildbeast commented 1 year ago

At iteration 1000? See: #8669

how to downgrade CUDNN in colab?

i'm running this code in colab !pip install libcudnn8-dev==8.4.1.50-1+cuda11.6 libcudnn8==8.4.1.50-1+cuda11.6

and get an error

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement libcudnn8-dev==8.4.1.50-1+cuda11.6 (from versions: none)
ERROR: No matching distribution found for libcudnn8-dev==8.4.1.50-1+cuda11.6
SergeyLev commented 1 year ago

At iteration 1000? See: #8669

how to downgrade CUDNN in colab?

i'm running this code in colab !pip install libcudnn8-dev==8.4.1.50-1+cuda11.6 libcudnn8==8.4.1.50-1+cuda11.6

and get an error

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement libcudnn8-dev==8.4.1.50-1+cuda11.6 (from versions: none)
ERROR: No matching distribution found for libcudnn8-dev==8.4.1.50-1+cuda11.6

!sudo apt-get install libcudnn8-dev=8.4.1.50-1+cuda11.6 libcudnn8=8.4.1.50-1+cuda11.6 pip has nothing to do with it!

heemalsic commented 7 months ago

At iteration 1000? See: #8669

how to downgrade CUDNN in colab? i'm running this code in colab !pip install libcudnn8-dev==8.4.1.50-1+cuda11.6 libcudnn8==8.4.1.50-1+cuda11.6 and get an error

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement libcudnn8-dev==8.4.1.50-1+cuda11.6 (from versions: none)
ERROR: No matching distribution found for libcudnn8-dev==8.4.1.50-1+cuda11.6

!sudo apt-get install libcudnn8-dev=8.4.1.50-1+cuda11.6 libcudnn8=8.4.1.50-1+cuda11.6 pip has nothing to do with it!

When I run this on colab I get the following error:

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package libcudnn8 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

Package libcudnn8-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Version '8.4.1.50-1+cuda11.6' for 'libcudnn8-dev' was not found
E: Version '8.4.1.50-1+cuda11.6' for 'libcudnn8' was not found