AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.8k stars 7.97k forks source link

False negatives are very high - yolov3 #1739

Open kmsravindra opened 6 years ago

kmsravindra commented 6 years ago

@AlexeyAB , I have 1920x1080 images that I am using for training with Yolo v3 model. I have kept my training network width at 608x608 and trained for around 7000 iterations. The object width and height typically vary between 40wx40h to 300wx300h ( these are approx to just give you a feel of the object sizes). This is a single class detection with ~6000 positive images(with object present) and ~6000 background images(with no object). So there are total of 12000+ images that I am training upon. All of them are 1920x1080 images.

Wen I test the model on the hold out test set ( which is also at 1920x1080 size), I am seeing that the number of false negatives to be very high (almost 90%) using a 608x608 detection network size. So, when I increase the detection network size to 1280x1280, the false negatives reduced but they are still around 80% of the total true positives. The number of false positives and true positives are substantially very very low.

And this false negative numbers are always high however many ways I trained my model by altering network sizes - I tried training network sizes of 832x832, 832x480, 608x608. Detection network sizes I have tried are - 608x608, 832x832, 832x480, 1280x1280 with each combination of training network sizes above.

The max mAP I got is around 50% ( that includes these high false negatives).

Could you please suggest -

  1. Are the training network sizes and detection network sizes good for the input image size that I have?
  2. Should I do any config changes ( I am already using random=1 and the typical learning rate params)?
  3. Do you think I need to increase my positive images data to include more variety of objects?
  4. Any other changes / suggestions would be very valuable as I have spent lot of time varying several parameters but is of not much use.

Please suggest on what else could be done to improve the false negatives.

thanks!

AlexeyAB commented 6 years ago

@kmsravindra Hi,

  1. How did you get your Test-dataset? Did you just randomly separate initial dataset to the 2 datasets: Training (80%) and Test (20%)?

  2. Can you show screenshots of windows that will be generated for Training (train=train.txt in obj.data) and Test datasets (train=test.txt in obj.data)? ./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 -show

AlexeyAB commented 6 years ago

@kmsravindra Hi,

  1. Can you show screenshot of cloud of points by using calc_anchors with the flag -show for Training and Test sets?

  2. What mAP can you get on the Training dataset? valid=train.txt in the obj.data file.

  3. Most likely the images of objects in the Training-set are completely different from the images in the Test-set. As if you were training for the 1 class “Transport”, but there were only “Cars” in the Training-set, and you would try to detect “Airs” in the Test-set.

  4. Also try to find several false-negative examples from Test-set, then try to find the most similary objects in the Training-set, and try to describe the main difference between these objects: color, size, aspect-ratio, shape, details, different size, ...

  5. I recommend you to increase your Training dataset. And follow this rule:

desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds

  1. I recommend you to train at least 12 000 - 15 000 iterations, if you have 12 000 images.

  2. Also you can try to set in your cfg-file: saturation=2.5 exposure=2.5 hue=0.3 in the [net]-section and jitter=0.4 random=1 in the each of 3 [yolo]-layers, and use. Then train about 120 000 iterations. It will use more agressive data augmentation, if you can't get more different Training images.

AlexeyAB commented 6 years ago

@kmsravindra

mAP on training dataset is quite good! at just 2700 iterations as below : detections_count = 16372, unique_truth_count = 5542 class_id = 0, name = polyp, 72 ap = 90.23 % for thresh = 0.25, precision = 0.87, recall = 0.94, F1-score = 0.91 for thresh = 0.25, TP = 5221, FP = 756, FN = 321, average IoU = 67.74 %

mean average precision (mAP) = 0.902264, or 90.23 %

So model is good and training goes well. The issue is in the Training dataset.


Also just as a note, since there are several negative images (where there are no lumps) from each video, I am doing a random sampling of negative images from hundreds of videos to gather equal number of images as positive images - as per your suggestion that I have read somewhere to have equal number of positive and negative images. So the dataset of 12000+ images here includes 6000+ positive images and 6000+ randomly sampled negative images.

This is just a general recommendation. But in your case with difficult to detect objects and very high False-Negatives (90%) - you should add much more Positive images.

Just keep the same old data augmentation parameters jitter=0.3 and random=1, collect Positive images as much as possible (with your old Negative samples), and Train again with iteration number ~= number of images.


Sure, I could do that...but I already saw overfitting happening around 4000 to 5000 iterations. For example the mAP that I have above is at just 2700 iterations and loss was showing up as 0.13+ which I believe is quite insignificant

Did you get the highest mAP for Training-set or for Test-set at 2700 iterations? Usually the highest mAP for Training-set is achieved much earlier than for Test-set.


Also wanted to add, not sure if this is significant mention - The testing set is also a set of frames extracted from a new video. And as you know, some of the frames have motion blur where lumps are not very clear...But we made sure that our training dataset has several motion blur images that contain lumps...

Given all this context, do you have any suggestions as to how we should prepare the data OR is YOLOv3 good enough when there are slightly blended objects into the background (to give an analogy like - suspecting a light grey colored car in the fog - we can visually have a hunch that such car is present at a distance but not clearly visible)? OR should we do something else? Your suggestions will be highly helpful!

So Training-set should have similar objects - with motion blur where lumps are not very clear. There is no image-bluring in the yolo data augmentation, so you should add as much as possible images with objects and blurs to your Training-set. Yolo v3 is good enought to detect such objects if they are in the Training-set.

kmsravindra commented 6 years ago

@AlexeyAB , Thanks for your recommendations. I will try to collect more positive images.