AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Error while training. #8867

Open yonisoft opened 10 months ago

yonisoft commented 10 months ago

I'm trying to train yolo-v4-tiny with rtx 4090 on windows cuda version 12.1, installed darknet with vcpkg. Training it with colab worked but with my pc i have the problem. This is the command: darknet detector train data/obj.data cfg/yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -dont_show -map

Yolov4 tiny config: '[net]

filters=(classes+5)x3

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16 width=640 height=640 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.00261 burn_in=1000 max_batches = 6000 policy=steps steps=4800,5400 scales=.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

##################################

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 scale_x_y = 1.05 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou ignore_thresh = .7 truth_thresh = 1 random=0 resize=1.5 nms_kind=greedynms beta_nms=0.6

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 23

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 scale_x_y = 1.05 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou ignore_thresh = .7 truth_thresh = 1 random=0 resize=1.5 nms_kind=greedynms beta_nms=0.6 `

The error is: ` (next mAP calculation at 1000 iterations) 1000: 0.092072, 0.090814 avg loss, 0.002610 rate, 0.218000 seconds, 64000 images, 0.307626 hours left

calculation mAP (mean average precision)... Detection layer: 30 - type = 28 Detection layer: 37 - type = 28 4 cuDNN status Error in: file: C:\Users\yoni1\Desktop\vcpkg\buildtrees\darknet\src\e778426c57-96aa9384e0.clean\src\convolutional_kernels.cu : forward_convolutional_layer_gpu() : line: 555 : build time: Nov 7 2023 - 01:45:54

cuDNN Error: CUDNN_STATUS_BAD_PARAM`

yonisoft commented 10 months ago

Solved by changing: subdivisions=16 from 16 to 64 or just taking of -map

ramanrewati commented 3 months ago

@yonisoft can you please share me the colab notebook

stephanecharette commented 3 months ago

@yonisoft and @ramanrewati:

This error is fixed in the new Darknet/YOLO repo: https://github.com/hank-ai/darknet#table-of- contents

ramanrewati commented 3 months ago

Thanks, I'm trying this one rn,hope I don't run into errors again 🙂

ramanrewati commented 3 months ago

@stephanecharette can I get the link to colab notebook, don't know how will the new repo work

stephanecharette commented 3 months ago

What colab notebook?

stephanecharette commented 3 months ago

https://discord.com/channels/741676058666860635/1184564987511963829

ramanrewati commented 3 months ago

What colab notebook?

The one I can train the yolo v4 from. Checking discord rn