AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

Tiny yolo and small objects #2637

Open czkaa opened 5 years ago

czkaa commented 5 years ago

I would like to train a tiny yolo v3 hand detector. It is supposed to detect hands in very specific footage. The camera perspective as well as the size of the hands is always more or less the same in every video:

github_demo (sorry for the unnecessary secretiveness but I am not sure about copyright)

So far, I have used the Oxford Dataset for training, where all images look more or less like this:

PUZZLE_COURTYARD_B_S_frame_1502

I got okay results for testing the Oxford Dataset but bad results for testing on the actual footage. Now, I have some questions regarding how to modify the training process for this taks:

1) Is there an equivalence of this hint for tiny yolo?

For training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

2) In this repo, I read tha the dataset should include such a set of relative sizes of objects that I want to detect. This is obviously not the case with my dataset. As an improvement, should I e.g. create thick borders aroung the traning images so that the relative size of the hands match the footage-to-detect? Or is it enough if I add some labelled frames from the footage to the dataset? If yes, how many (The training set so far is about 45000 images large.)?

Any help is appreciated!!

AlexeyAB commented 5 years ago

@kaaaaaaaaaaa Hi,

  1. Just try to use https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg

  2. It is required only if the sizes differ more than 2 times, i.e. if automatic data augmentation can't solve this problem.

Yes, you can add borders to the images, so relative sizes will be the same in Training and Testing datasets.

Or for example, if objects on Testing images are 2x smaller, then Train the model with width=832 height=832, and after training set width=416 height=416 for Detection.