Open willfu opened 6 years ago
mAP detections_count = 426, unique_truth_count = 296 class_id = 0, name = 602-LouGu, ap = 79.36 % class_id = 1, name = 7-HeiRongJinGui, ap = 99.17 % class_id = 2, name = 107-DaMing, ap = 31.89 % for thresh = 0.25, precision = 0.91, recall = 0.67, F1-score = 0.77 for thresh = 0.25, TP = 198, FP = 19, FN = 98, average IoU = 75.56 %
mean average precision (mAP) = 0.701406, or 70.14 % Total Detection Time: 10.000000 Seconds
@AlexeyAB
@willfu This is a good mAP 70%. Do you get bad detection (full screen) only for a few images, or for most of images? And what width= height= do you use for detection? Try to set width=832 height=832 in cfg-file.
@AlexeyAB only for a few images, since the others are only one class per image. while the mult-class training dataset could not get the right detection. width and height are 416 by default. I will try 832 and let you know the result.
@willfu Also if you can - send me your cfg-file, weights-file and images with bad detections and labels for them.
@AlexeyAB since the files are too big, is there any way you prefer to send to you?
@willfu You can use: https://www.google.com/drive/ Or something like this.
@AlexeyAB please let me know your email.
@willfu Send to this email: alexbently84@gmail.com
@AlexeyAB already sent ;)
@AlexeyAB The mAP of width and height 832 is as below, while this weight even could not make any detection of the first image in the training dataset. Any clue? Thanks. :(
./darknet.exe detector map data/obj.data yolo-obj.cfg backup/yolo-obj_9400.weights detections_count = 694, unique_truth_count = 296 class_id = 0, name = 602-LouGu, ap = 70.50 % class_id = 1, name = 7-HeiRongJinGui, ap = 77.30 % class_id = 2, name = 107-DaMing, ap = 33.39 % for thresh = 0.25, precision = 0.73, recall = 0.64, F1-score = 0.68 for thresh = 0.25, TP = 189, FP = 69, FN = 107, average IoU = 56.61 %
mean average precision (mAP) = 0.603973, or 60.40 % Total Detection Time: 27.000000 Seconds
I think there is the problem, because on images with 1 object the relative size of object is much bigger that relative size of objects on images with many objects.
jitter=0.2
random=1
in all of 3 [yolo]
layerswidth=320 height=320
, and then detect with width=416 height=416
As you can see, there is detection, but with very low probability, because very few such training-samples:
@AlexeyAB The results of your required tests are as above, the detection seems better, but still some missed. And for the sub image the result seems bad, is it related with jitter 0.2?
@willfu
And for the sub image the result seems bad, is it related with jitter 0.2?
It is related to jitter=0.2 and training with width=320 height=320, and then detection with width=416 height=416
Do you have this sub-image in the training dataset? If no, try to include such sub-images to your dataset, if you want to detect obects on it.
The basic rule is - training dataset should contain objects with relative sizes of objects the same as you want to detect.
the detection seems better, but still some missed.
Also, do you use default values for learning_rate, burn_in, policy, scales, steps
in your cfg file? If yes, then just try to train up to 15 000 iterations.
@AlexeyAB since it is subimage of the first image in training dataset, I assume it is already in the dataset, right? I am not sure whether yolo v3 could handle different scale well, do I still need add the image to the dataset?
@willfu Not quite. This is true only if you will set jitter=0.4
and will train more than ~75 000 iterations instead of 7 500 iterations.
I think to crop such sub-image, we should use jitter=0.4
, but in this case the scale will vary much more (as if there will be ~10x much more images), and we should train ~10x more iterations.
So:
Or you can set jitter=0.4
and train 10x times more iterations, for much more scales. So, the bigger the jitter - the more scaled the image will be:
jitter=0.2
image will be zoomed from 1/(1+2*0.2)
= 0.714x to 1/(1-2*0.2)
= 1.7x timesjitter=0.3
image will be zoomed from 1/(1+2*0.3)
= 0.625x to 1/(1-2*0.3)
= 2.5x timesjitter=0.4
image will be zoomed from 1/(1+2*0.4)
= 0.556x to 1/(1-2*0.4)
= 5x times5 / 1.7 = ~3
, but we should know that width and height are scaled independently with changing aspect ratio, so diversity will be 3x3 = ~10 times more, and it will require 10 x times more iterations.
Also: https://github.com/AlexeyAB/darknet/issues/747#issuecomment-386869981
Any modern convolutional neural network isn't Scale-invariance, i.e. if neural network is trained only on the objects with size 50x50 pixels, then it can't detect objects which sizes differe more than ~30%.
@AlexeyAB after training for 85000 with jitter 0.4, the result is not good at 11000 with jitter 0.2, yolo could not detect the whole pattern in the training dataset figure 1. May I still need to make the iteration?
@willfu
It can't detect whole pattern figure 1. But can it detect sub-images that you brought?
Did you train with random=1
in both cases, and what width= height= did you use?
What mAP can you get in both cases?
Try to train about 200 000 iterations with jitter=0.4
If it doesn't help, update your code from GitHub and try to change this: https://github.com/AlexeyAB/darknet/blob/f9ecf6fd3f0df305d7103ecc3f15b23bba260baf/src/detector.c#L137
to float random_val = rand_scale(2.0);
then try to train with jitter=0.2 random=1
Try to add sub-images to your dataset, that you want to detect. And make duplicates of image on which objects are poorly detected.
I just used the latest code, seems 0504, with yolo v3, opencv 2411, vs 2015, cuda 9.1, cudnn. set classes, filters, anchors, and also want to improve quality to make jitter to 0.4, random to 1(default). The labeling tool is yolo_mark, classes is only 3. After weight is 10100, I use it to detect the training set, but with wrong detection. It detect the whole picture as one class... Is there any clue? Thanks.