AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.8k stars 7.97k forks source link

Question about generalization ability of YOLO #3634

Open hadign20 opened 5 years ago

hadign20 commented 5 years ago

Hi I have a question about the generalization ability of YOLO or deep learning methods in general:

I am working on vehicle detection and classification in highway traffic surveillance videos. As you know, these videos can be captured using different cameras with various settings (resolution, angle, quality, etc). My question is about the possibility of training a deep learning model like YOLO to detect vehicles in all scenarios. For example, in Pascal VOC 2012, there are only 327 training sample of airplanes which are very different in case of size, angle, shape, etc.

If I gather many traffic videos and label all of them to have a dataset of various vehicles (e.g. 500k vehicles) with different colors, angles, sizes, etc; would it be possible to rely on this trained model to use in real world applications?

If yes, I would appreciate your suggestions about these issues:

  1. How many instances of each vehicle (specific angle) is good enough to be in training set?
  2. How many background images are necessary?
  3. Will the model overfit if the number of training images is too many, or there is no limit?

Thank you in advance!

AlexeyAB commented 5 years ago

@hadi-ghnd Hi,

If I gather many traffic videos and label all of them to have a dataset of various vehicles (e.g. 500k vehicles) with different colors, angles, sizes, etc; would it be possible to rely on this trained model to use in real world applications?

Yes.

How many instances of each vehicle (specific angle) is good enough to be in training set?

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds

How many background images are necessary?

There should be all backgrounds which will be in your production software.

Will the model overfit if the number of training images is too many, or there is no limit?

The more different images - the less probability that the model will be overfitted.