Tiny yolo and small objects

I would like to train a tiny yolo v3 hand detector. It is supposed to detect hands in very specific footage. The camera perspective as well as the size of the hands is always more or less the same in every video:

github_demo (sorry for the unnecessary secretiveness but I am not sure about copyright)

So far, I have used the Oxford Dataset for training, where all images look more or less like this:

PUZZLE_COURTYARD_B_S_frame_1502

I got okay results for testing the Oxford Dataset but bad results for testing on the actual footage. Now, I have some questions regarding how to modify the training process for this taks:

1) Is there an equivalence of this hint for tiny yolo?

For training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

2) In this repo, I read tha the dataset should include such a set of relative sizes of objects that I want to detect. This is obviously not the case with my dataset. As an improvement, should I e.g. create thick borders aroung the traning images so that the relative size of the hands match the footage-to-detect? Or is it enough if I add some labelled frames from the footage to the dataset? If yes, how many (The training set so far is about 45000 images large.)?

Any help is appreciated!!

AlexeyAB / darknet

Tiny yolo and small objects #2637