AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

crash when training yolov4-tiny #7096

Open tigerdhl opened 3 years ago

tigerdhl commented 3 years ago

i have the same issue to some memory leak problem raised by others, it stops at radomly iteration when training,and my computer is crash. I use the yolov4-tiny, I clone the project 3 day ago, i think is a new version, my device: Dingtalk_20201209111316 GTX 2080Ti GPU=1 CUDNN=1 OPENCV=1

my training code is :./darknet detector train data/voc_my.data cfg/yolov4-tiny-mydata.cfg weight/yolov4-tiny.conv.29 -dont_show other ways also get the same problem: 1、without "dont_show" 2、OPENCV=0

my cfg files is change from "yolov4-tiny-custom.cfg" batch=64 subdivisions=1 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.00261 burn_in=1000 max_batches = 14000 policy=steps steps=11200,12600 scales=.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[route] layers=-1 groups=2 group_id=1

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[route] layers = -1,-2

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[route] layers = -6,-1

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

##################################

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=36 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=7 num=6 jitter=.3 scale_x_y = 1.05 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou ignore_thresh = .7 truth_thresh = 1 random=0 resize=1.5 nms_kind=greedynms beta_nms=0.6

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 23

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=36 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=7 num=6 jitter=.3 scale_x_y = 1.05 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou ignore_thresh = .7 truth_thresh = 1 random=0 resize=1.5 nms_kind=greedynms beta_nms=0.6

I have test the memory leak used valgrind,but I can't understand the output information

aparico commented 3 years ago

Try increasing your subdivisions to 16, 32, or 64