Detection of small objects in very high resolution video

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.65k stars 7.95k forks source link

Detection of small objects in very high resolution video #6559

Open droogg opened 4 years ago

droogg commented 4 years ago

@AlexeyAB Hi! Thanks so much for your incredible work! I have read all issues directly or indirectly related to my question. Unfortunately, I could not find a clear answer to my question. My task is the need to detect small objects (about 15x15 pixels) in a very large video of 6000x4000 pixels. What's the best way to do this? The only option I can imagine is to train the network to detect objects on 832x832 pixels tiles. Then, in the process of receiving frames from the camera, divide them into tiles of the same size (832x832 pix), receive output from each part of the image, and collect all detections using the algorithm of non max suppression. Is there a way to do this more elegantly? Or maybe the darknet has some kind of built-in tools that can help me?

xxtkidxx commented 4 years ago

@AlexeyAB Hi I am also very interested in the question above. Please help me with solution for small object.

WongKinYiu commented 4 years ago

https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3_5l.cfg for your reference.

droogg commented 4 years ago

@WongKinYiu , @AlexeyAB Ok, thanks! How to preprocess data? An image larger than 2000x2000 pixels will not fit in my 2080TI or Jetson XAVIER. Are there any other options for processing it, besides splitting the original frame into parts for further processing on the darknet?

stephanecharette commented 4 years ago

An image larger than 2000x2000 pixels will not fit in my 2080TI or Jetson XAVIER. Are there any other options for processing it, besides splitting the original frame into parts for further processing on the darknet?

Resize the image to a smaller dimension? It will come down to the size of the object you want to detect, and possibly where those objects are located within the image.

sfleisch314 commented 3 years ago

I have found three papers with three different methods for tackling this problem.

I am working on implementing some or all of the methods starting with #3. They all rely on splitting the image into tiles. Two of them use an attention mechanism to limit the number of inferences that have to be done. The third combines shrinking the overall image as well as tiling and then using additional non-max suppression and, possibly, other techniques to merge the detections.