AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.95k forks source link

darknet stuck at beginning of execution #8676

Open joarezpj opened 1 year ago

joarezpj commented 1 year ago

Hi,

I'm using darknet for custom object detection and YOLOv4 in a Google Colab project. Until last week the code was working fine, but now I'm getting stuck at the beginning of both training and testing execution.

!./darknet detector train data/obj.data /content/drive/MyDrive/IA/Sinterização/yolov4-obj.cfg yolov4.conv.137 -dont_show -map

obj.data file:

classes = 1
train = data/train.txt
valid = data/test.txt
names = data/obj.names
backup = backup/

The execution simply freezes in the lines shown below:

 CUDA-version: 11010 (11020), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 3.2.0
 Prepare additional network for mAP calculation...
 0 : compute_capability = 800, cudnn_half = 1, GPU: A100-SXM4-40GB 
net.optimized_memory = 0 
mini_batch = 1, batch = 16, time_steps = 1, train = 0 
   layer   filters  size/strd(dil)      input                output
   0 Create CUDA-stream - 0 

Is there something I should do different? I didn't change the code since last week when it was working fine.

Thanks in advance!

vsaw commented 12 months ago

@joarezpj Did you manage to fix it? I'm stuck with the same Problem, when trying to run Darknet in Docker (only in Docker, it works when running natively)

The Device I'm using is a Jetson Xavier NX Devkit. (If it's makes a difference 🤷‍♂️)

vsaw commented 12 months ago

This seems to be an issue with the cuDNN init. See https://github.com/NVIDIA/nvidia-container-toolkit/issues/124