AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Error when training my custom object detector! #309

Open zhao-haha opened 6 years ago

zhao-haha commented 6 years ago

I want to train a hand detector using yolo2, so i cloned this repo and compiled it successfully on my PC(Win10 64bit, GTX1050, CUDA8.0, CUDNN6.0, Visual Studio 2015), Then i prepared labelled hand dataset (2000 images with resolution 1080x720) and split them as train(80%) and test(20%), i followed the instructions in README and use the darknet.exe to train the model, However, it stopped at once just after start the training! With no errors! Does it mean the training succedd?

Output image what does "seen 64" mean?

Input I put all input under darknet/build/darknet/x64/data, just as below: image

[obj] dir is training set, [test] dir is test set, it looks like: image

labels in each txt file was converted to yolo format, for example:

0 0.576953125 0.6590277777777778 0.13359375 0.14305555555555555 0 0.457421875 0.6291666666666667 0.09296875 0.16111111111111112

I have only ONE classs: Hand, So my obj.names has only one line: image

And my obj.data:

classes= 1 train = data/train.txt valid = data/test.txt names = data/obj.names backup = backup/

My train.txt and test.txt looks: image

Models I want to perform real hand detection on my PC, so i choose tiny-yolo, i copyed the /cfg/tiny-yolo.cfg to /data/tiny-yolo-hand.cfg and set classes= 1, filters = 30. tiny-yolo.weights is downloaded from http://pjreddie.com/media/files/tiny-yolo.weights

`[net] batch=64 subdivisions=8 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 max_batches = 120000 policy=steps steps=-1,100,80000,100000 scales=.1,10,.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[region] anchors = 0.738768,0.874946, 2.42204,2.65704, 4.30971,7.04493, 10.246,4.59428, 12.6868,11.8741 bias_match=1 classes=1
coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=1`

zhao-haha commented 6 years ago

Besides, i tried to use the 'trained' model to detect hands, No hand is detected without surprise

AlexeyAB commented 6 years ago

Hi, you should use darknet.exe detector train data/obj.data tiny-yolo-hand.cfg darknet19_448.conv.23 instead of darknet.exe detector train data/obj.data tiny-yolo-hand.cfg tiny-yolo.weights

zhao-haha commented 6 years ago

Thanks for your reply! I tried to use yolo2.0.cfg to train darknet19_448.conv.23 and this problem was solved, however, the out put of training seems wrong: image

zhao-haha commented 6 years ago

Here is my custom yolo2.0.cfg(set classes to 1 and filters to 30 in the end):

[net] batch=1 subdivisions=1 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 max_batches = 120000 policy=steps steps=-1,100,80000,100000 scales=.1,10,.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

#######

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[route] layers=-9

[reorg] stride=2

[route] layers=-1,-3

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[region] anchors = 0.738768,0.874946, 2.42204,2.65704, 4.30971,7.04493, 10.246,4.59428, 12.6868,11.8741 bias_match=1 classes=1 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=0

AlexeyAB commented 6 years ago

@ZhaoWangFu It seems all right in training output. You should train about 2000 iterations or more.

zhao-haha commented 6 years ago

ok, thanks, I will try it

zhao-haha commented 6 years ago

one more question, should i change the width and height in cfg because input image is 1280*720? should All image be the same size? Or do i have to resize the 1280x720 image to smaller size for faster speed? Thanks a lot!

AlexeyAB commented 6 years ago

@ZhaoWangFu No, you should not change anything.

zhao-haha commented 6 years ago

sorry, i paste a wrong image, why are the values are all Nan? image

zhao-haha commented 6 years ago

Util i stopped this process, its all Nan values, except for the first few lines image

zhao-haha commented 6 years ago

@AlexeyAB Was the 'Nan' values problem occured just because of insufficient training or something else?

zhao-haha commented 6 years ago

I removed samples which does not have annotations in my dataset, and things goes better, However, There are still a few nans with count == 0, like this:

Region Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.010644, Avg Recall: -nan(ind), count: 0 9693: 0.002723, 0.436156 avg, 0.001000 rate, 0.240000 seconds, 9693 images

zhao-haha commented 6 years ago

Is it possible to see which image the darknet is using when training? Then i can remove it from my dataset.

EthanPen commented 6 years ago

@ZhaoWangFu Hi , Nice to meet you, I got a few problems that I supposed you know the solutions . Ah , Would you mind if I have you contact information or could you contact me by QQ(398912742) or email( ethan.penx@gmail.com) ? Thanks a million

AlexeyAB commented 6 years ago

I removed samples which does not have annotations in my dataset, and things goes better, However, There are still a few nans with count == 0, like this:

zhao-haha commented 6 years ago

@AlexeyAB Yes,Weights for 3000 iterations works fine, And the average loss error is about 0.05.

YMelon commented 6 years ago

@ZhaoWangFu Hello, I am training a yolo model using the same dataset(egohand) with you.I just a beginer, so could I ask you for some help! "Weights for 3000 iterations works fine" -- Is the custom yolo2.0.cfg you posted above used for train?

zhao-haha commented 6 years ago

@YMelon Yes,exactly the same. Before training, I picked some ‘Good’ samples from Egohand dataset because the quality of training samples is important.

YMelon commented 6 years ago

@ZhaoWangFu Thanks so much for you reply! And sorry for not reply to you currently, because I am picking imags from Egohand as you said util now. 'Good' samples means high resolution ? For example, followed imag should pick or not?

![exam-1][C:\Users\user\Desktop\ToolPool\frame_1370.jpg] ![exam-1][C:\Users\user\Desktop\ToolPool\frame_1470.jpg]

YMelon commented 6 years ago

frame_1370 frame_1470

zhao-haha commented 6 years ago

The meanning of 'Good' depends on your target, If you want to train a hand detector like i did, then i think it's important to have clear and complete hand in sample image

zhao-haha commented 6 years ago

But i am not sure whether those samples which contains a small part of hand will affect the training

YMelon commented 6 years ago

@ZhaoWangFu Yes, my target is the same with you! Your meaning is to pick images contained complete and clear hand, drop those contains a small part of hand and not clear hands

zhao-haha commented 6 years ago

Yes,Exactly

YMelon commented 6 years ago

Thanks a lot, I'll try it.

YMelon commented 6 years ago

@ZhaoWangFu Hi, I'm back again! I'm sorry, things seems not good, so have to bother you again.I picked 1000 images for training(whether I droped too much?), cfg is the same with you, CPU runs for 4~5 days, but model not convergence even iterations for 20000 , Loss is ~230 dont decrease anymore.I use the trained weights for 2000, ..., 20000 too much output boxes.Results as followd

YMelon commented 6 years ago

Region Avg IOU: 0.093700, Class: 1.000000, Obj: 0.208604, No Obj: 0.436096, Avg Recall: 0.000000, count: 2 22400: 234.039932, 236.237473 avg, 0.001000 rate, 25.045385 seconds, 22400 images Loaded: 0.000080 seconds Region Avg IOU: 0.240916, Class: 1.000000, Obj: 0.512090, No Obj: 0.438591, Avg Recall: 0.000000, count: 2 22401: 221.970322, 234.810760 avg, 0.001000 rate, 25.234085 seconds, 22401 images

predictions

YMelon commented 6 years ago

@ZhaoWangFu @AlexeyAB Hi,the problem of loss large is not exist when I use windows darknet version((https://github.com/AlexeyAB/darknet). But another problem, while training the output Obj value is very small, and I use the trained weights of 2000 to detection, not box output . I used 1400 hand images for training, cfg file is the same as @ZhaoWangFu . Is there any wrong? Or just continue to train for more iteration? The output as followed: When start:

1: 12.637802, 12.637802 avg, 0.000100 rate, 40.974274 seconds, 1 images Loaded: 0.000056 seconds Region Avg IOU: 0.255450, Class: 1.000000, Obj: 0.312328, No Obj: 0.401508, Avg Recall: 0.000000, count: 2 2: 11.193381, 12.493361 avg, 0.000100 rate, 23.499844 seconds, 2 images Loaded: 0.000061 seconds Region Avg IOU: 0.547629, Class: 1.000000, Obj: 0.360418, No Obj: 0.314297, Avg Recall: 0.666667, count: 3 3: 6.136428, 11.857667 avg, 0.000100 rate, 23.538784 seconds, 3 images Loaded: 0.000059 seconds Region Avg IOU: 0.387025, Class: 1.000000, Obj: 0.228083, No Obj: 0.216744, Avg Recall: 0.000000, count: 3 5: 3.318567, 10.275932 avg, 0.000100 rate, 40.721462 seconds, 5 images Loaded: 0.000054 seconds Region Avg IOU: 0.371768, Class: 1.000000, Obj: 0.171249, No Obj: 0.092966, Avg Recall: 0.500000, count: 2 6:2.750375, 9.523376 avg, 0.000100 rate, 23.559980 seconds, 6 images Loaded: 0.000072 seconds Region Avg IOU: 0.321536, Class: 1.000000, Obj: 0.049954, No Obj: 0.058721, Avg Recall: 0.250000, count: 4 7:5.134910, 9.084530 avg, 0.000100 rate, 23.591419 seconds, 7 images Loaded: 0.000058 seconds Region Avg IOU: 0.460021, Class: 1.000000, Obj: 0.034833, No Obj: 0.039374, Avg Recall: 0.500000, count: 2 ...... After 2000 iteration 2971: 0.691351, 0.843138 avg, 0.001000 rate, 23.632906 seconds, 2971 images Loaded: 0.000057 seconds Region Avg IOU: 0.404327, Class: 1.000000, Obj: 0.030576, No Obj: 0.008422, Avg Recall: 0.500000, count: 2 2972: 1.040303, 0.862855 avg, 0.001000 rate, 23.536030 seconds, 2972 images Loaded: 0.000055 seconds Region Avg IOU: 0.696566, Class: 1.000000, Obj: 0.023101, No Obj: 0.009011, Avg Recall: 1.000000, count: 3 2973: 0.378374, 0.814407 avg, 0.001000 rate, 23.603308 seconds, 2973 images Loaded: 0.000065 seconds Region Avg IOU: 0.419630, Class: 1.000000, Obj: 0.034383, No Obj: 0.008450, Avg Recall: 0.500000, count: 2 2974:0.762052, 0.809171 avg, 0.001000 rate, 23.911331 seconds, 2974 images Loaded: 0.000057 seconds Region Avg IOU: 0.487199, Class: 1.000000, Obj: 0.036471, No Obj: 0.008485, Avg Recall: 0.666667, count: 3

AlexeyAB commented 6 years ago

@YMelon