detect wrong using training dataset

willfu commented 6 years ago

I just used the latest code, seems 0504, with yolo v3, opencv 2411, vs 2015, cuda 9.1, cudnn. set classes, filters, anchors, and also want to improve quality to make jitter to 0.4, random to 1(default). The labeling tool is yolo_mark, classes is only 3. After weight is 10100, I use it to detect the training set, but with wrong detection. It detect the whole picture as one class... Is there any clue? Thanks.

AlexeyAB commented 6 years ago

Can you show example of wrong detection?
What mAP can your get?

willfu commented 6 years ago

predictions

mAP detections_count = 426, unique_truth_count = 296 class_id = 0, name = 602-LouGu, ap = 79.36 % class_id = 1, name = 7-HeiRongJinGui, ap = 99.17 % class_id = 2, name = 107-DaMing, ap = 31.89 % for thresh = 0.25, precision = 0.91, recall = 0.67, F1-score = 0.77 for thresh = 0.25, TP = 198, FP = 19, FN = 98, average IoU = 75.56 %

mean average precision (mAP) = 0.701406, or 70.14 % Total Detection Time: 10.000000 Seconds

@AlexeyAB

AlexeyAB commented 6 years ago

@willfu This is a good mAP 70%. Do you get bad detection (full screen) only for a few images, or for most of images? And what width= height= do you use for detection? Try to set width=832 height=832 in cfg-file.

willfu commented 6 years ago

@AlexeyAB only for a few images, since the others are only one class per image. while the mult-class training dataset could not get the right detection. width and height are 416 by default. I will try 832 and let you know the result.

AlexeyAB commented 6 years ago

@willfu Also if you can - send me your cfg-file, weights-file and images with bad detections and labels for them.

willfu commented 6 years ago

@AlexeyAB since the files are too big, is there any way you prefer to send to you?

AlexeyAB commented 6 years ago

@willfu You can use: https://www.google.com/drive/ Or something like this.

willfu commented 6 years ago

@AlexeyAB please let me know your email.

AlexeyAB commented 6 years ago

@willfu Send to this email: alexbently84@gmail.com

willfu commented 6 years ago

@AlexeyAB already sent ;)

willfu commented 6 years ago

@AlexeyAB The mAP of width and height 832 is as below, while this weight even could not make any detection of the first image in the training dataset. Any clue? Thanks. :(

./darknet.exe detector map data/obj.data yolo-obj.cfg backup/yolo-obj_9400.weights detections_count = 694, unique_truth_count = 296 class_id = 0, name = 602-LouGu, ap = 70.50 % class_id = 1, name = 7-HeiRongJinGui, ap = 77.30 % class_id = 2, name = 107-DaMing, ap = 33.39 % for thresh = 0.25, precision = 0.73, recall = 0.64, F1-score = 0.68 for thresh = 0.25, TP = 189, FP = 69, FN = 107, average IoU = 56.61 %

mean average precision (mAP) = 0.603973, or 60.40 % Total Detection Time: 27.000000 Seconds

AlexeyAB commented 6 years ago

I think there is the problem, because on images with 1 object the relative size of object is much bigger that relative size of objects on images with many objects.

Because you have much more images with one object (large relative size), then set jitter=0.2
Use random=1 in all of 3 [yolo] layers
Train with width=320 height=320, and then detect with width=416 height=416
Now you have only 5 images/labels with many object (small relative size): 1,3,4,5,6.jpg/txt. Try to make 10 duplacates of each of these jpg/txt-files and add it to the train.txt (just run Yolo_mark - it will include all jpg-images to the train.txt). So you will have ~50 images/labels with many objects. Then train again.

As you can see, there is detection, but with very low probability, because very few such training-samples:

willfu commented 6 years ago

39952647-277cd154-55ce-11e8-8696-1e03281f9965

@AlexeyAB The results of your required tests are as above, the detection seems better, but still some missed. And for the sub image the result seems bad, is it related with jitter 0.2?

AlexeyAB commented 6 years ago

@willfu

And for the sub image the result seems bad, is it related with jitter 0.2?

It is related to jitter=0.2 and training with width=320 height=320, and then detection with width=416 height=416

Do you have this sub-image in the training dataset? If no, try to include such sub-images to your dataset, if you want to detect obects on it.

The basic rule is - training dataset should contain objects with relative sizes of objects the same as you want to detect.

the detection seems better, but still some missed.

Also, do you use default values for learning_rate, burn_in, policy, scales, steps in your cfg file? If yes, then just try to train up to 15 000 iterations.

willfu commented 6 years ago

@AlexeyAB since it is subimage of the first image in training dataset, I assume it is already in the dataset, right? I am not sure whether yolo v3 could handle different scale well, do I still need add the image to the dataset?

AlexeyAB commented 6 years ago

@willfu Not quite. This is true only if you will set jitter=0.4 and will train more than ~75 000 iterations instead of 7 500 iterations.

I think to crop such sub-image, we should use jitter=0.4, but in this case the scale will vary much more (as if there will be ~10x much more images), and we should train ~10x more iterations. So:

Or you can add few sub-images with such scales that you want to detect
Or you can set jitter=0.4 and train 10x times more iterations, for much more scales. So, the bigger the jitter - the more scaled the image will be:
- For jitter=0.2 image will be zoomed from 1/(1+2*0.2) = 0.714x to 1/(1-2*0.2) = 1.7x times
- For jitter=0.3 image will be zoomed from 1/(1+2*0.3) = 0.625x to 1/(1-2*0.3) = 2.5x times
- For jitter=0.4 image will be zoomed from 1/(1+2*0.4) = 0.556x to 1/(1-2*0.4) = 5x times

5 / 1.7 = ~3, but we should know that width and height are scaled independently with changing aspect ratio, so diversity will be 3x3 = ~10 times more, and it will require 10 x times more iterations.

Also: https://github.com/AlexeyAB/darknet/issues/747#issuecomment-386869981

Any modern convolutional neural network isn't Scale-invariance, i.e. if neural network is trained only on the objects with size 50x50 pixels, then it can't detect objects which sizes differe more than ~30%.

willfu commented 6 years ago

@AlexeyAB after training for 85000 with jitter 0.4, the result is not good at 11000 with jitter 0.2, yolo could not detect the whole pattern in the training dataset figure 1. May I still need to make the iteration?

AlexeyAB commented 6 years ago

@willfu

It can't detect whole pattern figure 1. But can it detect sub-images that you brought?
Did you train with random=1 in both cases, and what width= height= did you use?
What mAP can you get in both cases?
Try to train about 200 000 iterations with jitter=0.4
If it doesn't help, update your code from GitHub and try to change this: https://github.com/AlexeyAB/darknet/blob/f9ecf6fd3f0df305d7103ecc3f15b23bba260baf/src/detector.c#L137 to float random_val = rand_scale(2.0); then try to train with jitter=0.2 random=1
Try to add sub-images to your dataset, that you want to detect. And make duplicates of image on which objects are poorly detected.

AlexeyAB / darknet

detect wrong using training dataset #755