AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

mAP is not getting plotted and the training hangs and is not responding during map calculating iteration for tiny yolov3 when i am training on my laptop in cmd with opencv and cuda 10.0 #4544

Open bhavyabpk opened 4 years ago

bhavyabpk commented 4 years ago

This is the plot i am getting

chart - Copy

As you can see inspite of -map flag, only loss is getting plotted.

Initially it says in log that map will be calculated at 400 iteration but after 399 iteration the process hangs and in task manager its written not responsing in front of darknet task. i waited for hours but it just remained hanged. So i terminated the process by pressing ctrl+c and this was the output - 4^ZCUDA Error Prev: unspecified launch failure: No error Assertion failed: 0, file c:\users\bhavya\python\darknet-master\src\utils.c, line 297

this is how i am training - darknet.exe detector train tinyyolov3\obj.data tinyyolov3\yolov3-tiny_obj.cfg tinyyolov3\yolov3-tiny.conv.15 > tinyyolov3\train.log -map

this is the cfg file

`[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=400 max_batches = 7200 policy=steps steps=4800,5400 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 `

AlexeyAB commented 4 years ago

It seems something wrong with your validation dataset.

Show obj.dat file Check bad.list and bad_label.list files Are there any other error messages? What params did you set in the Makefile? Set valid=train.txt in obj.data file

wangzizhe commented 4 years ago

This is the plot i am getting

chart - Copy

As you can see inspite of -map flag, only loss is getting plotted.

Initially it says in log that map will be calculated at 400 iteration but after 399 iteration the process hangs and in task manager its written not responsing in front of darknet task. i waited for hours but it just remained hanged. So i terminated the process by pressing ctrl+c and this was the output - 4^ZCUDA Error Prev: unspecified launch failure: No error Assertion failed: 0, file c:\users\bhavya\python\darknet-master\src\utils.c, line 297

this is how i am training - darknet.exe detector train tinyyolov3\obj.data tinyyolov3\yolov3-tiny_obj.cfg tinyyolov3\yolov3-tiny.conv.15 > tinyyolov3\train.log -map

this is the cfg file

`[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=400 max_batches = 7200 policy=steps steps=4800,5400 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=24 activation=linear

[yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 `

@bhavyabpk hi did you solve this problem, i'm also facing this issue now...

bhavyabpk commented 4 years ago

This is the plot i am getting chart - Copy As you can see inspite of -map flag, only loss is getting plotted. Initially it says in log that map will be calculated at 400 iteration but after 399 iteration the process hangs and in task manager its written not responsing in front of darknet task. i waited for hours but it just remained hanged. So i terminated the process by pressing ctrl+c and this was the output - 4^ZCUDA Error Prev: unspecified launch failure: No error Assertion failed: 0, file c:\users\bhavya\python\darknet-master\src\utils.c, line 297 this is how i am training - darknet.exe detector train tinyyolov3\obj.data tinyyolov3\yolov3-tiny_obj.cfg tinyyolov3\yolov3-tiny.conv.15 > tinyyolov3\train.log -map this is the cfg file `[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.001 burn_in=400 max_batches = 7200 policy=steps steps=4800,5400 scales=.1,.1 [convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=1 [convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky ########### [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=24 activation=linear [yolo] mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 [route] layers = -4 [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [upsample] stride=2 [route] layers = -1, 8 [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=24 activation=linear [yolo] mask = 0,1,2 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 classes=3 num=6 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 `

@bhavyabpk hi did you solve this problem, i'm also facing this issue now...

Yes, there were some images which were not of proper format. So if you can make sure or convert all the images properly to a common format, this issue may get resolved. Even if the extension is same but there may be some images which are actually png but the extension is jpeg. in such cases error may come. So try fixing this.