AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.66k stars 7.96k forks source link

yolov3 to detect snooker balls: how to improve accuracy #5119

Open tozzad94 opened 4 years ago

tozzad94 commented 4 years ago

I am trying to develop a model to find the position of the balls on a snooker table from a "behind" (rather than above) camera shot and to identify the colour of each ball.

I am using AlexeyAB's darknet implementation of yolov3 but the results are a bit short of what I was hoping for (achieves a MaP of around 70% on a validation set). Anecdotally, the predictions really suffer when the balls are positioned in clusters, which happens quite frequently in snooker! I'm using 640 x 360 pixel images and the training set has about 300 examples of each ball class, and considerably more examples of reds, as a game of snooker begins with fifteen reds on the table and only one of each other colour.

Here is a sample prediction, to give you an idea...

https://imgur.com/a/MAooGl9

Anyway, I was wondering how I might best adapt the .cfg file for this task, where the objects are invariably fairly small (12 x 12 pixels or so, depending on how close the ball is to the near/camera side of the table), and where the RGB values are so crucial to the classification phase (this is why I've suppressed the hue/saturation/exposure parameters).

subdivisions=32
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=15
saturation=0
exposure=0
hue=0

...

[yolo] # this is the last of the three yolo layers
mask = 0,1,2
anchors = 5,6, 4,8, 5,10, 6,8, 5,10, 6,10, 7,12, 15,16, 20,25
classes=9
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

A few things I was thinking of doing (advice most welcome):

Increasing the size of the network from 416 x 416 and retraining - I guess this would give the model a better chance of picking up on the balls which are partly obscured by neighbouring balls?

Increasing the number of anchors from 9 and retraining (I have already used the calc_anchors function to generate a supposedly optimal set of bounding box widths and heights)

As suggested by AlexeyAB on the darknet Github page: "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11, and stride=4 on lines 717 and 720 in the cfg. file - I am not experienced enough to know why this is good advice (guessing something to do with preserving some of the lower-level features)

A bit of a kop-out but I could train the model on fewer classes, e.g. collapsing all the individual ball classes (red, black, pink, etc.,) into a single "ball" class, and then perform the classification in a second phase in opencv, say by mapping the RGB values in the bounding boxes proposed by yolov3 to the most likely snooker ball colour.

There is also in the back of mind the idea that using yolov3 may not be the most effective way of tackling the problem to begin with and other deep learning or even more standard methods might be better suited. For instance, there is a thesis written about a seemingly successful approach using the opencv library but annoyingly the entire thing (other than the abstract) is written in Slovenian (http://eprints.fri.uni-lj.si/2590/).

AlexeyAB commented 4 years ago

You should able to get ~99% mAP

As suggested by AlexeyAB on the darknet Github page: "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11, and stride=4 on lines 717 and 720 in the cfg. file - I am not experienced enough to know why this is good advice (guessing something to do with preserving some of the lower-level features)

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

tozzad94 commented 4 years ago

Thanks a lot for the advice, and for the speed of response.

Some more questions:

Cheers

tozzad94 commented 4 years ago

Ah, and anchors as well, should the output of the calc_anchors function suffice? Is nine clusters sufficient/too many?

AlexeyAB commented 4 years ago

What makes 608 x 608 the optimal network size for this situation? Should I set random = 0 to guarantee this size across all iterations? Or fine to leave random = 1?

In your case better to train random=0 and set lower subdivisions=

yolov3-spp has +3% accuracy compared to yolov3

• Does the darknet implementation automatically augment the dataset with flips/crops and so on? Or should I prepare these in advance of training?

Yes, due to [net] flip=1 by default. And crop [yolo] jitter=0.3

Read: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)* before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

tozzad94 commented 4 years ago

Thanks a lot.

So the calc_anchors yielded this on my dataset (having changed width and height to 608 x 608, and using 9 as num-clusters):

anchors = 7, 10, 7, 14, 9, 14, 8, 18, 10, 17, 10, 20, 21, 20, 23, 28, 31, 39

Which of these anchors should I use in each of the three [yolo] layers? Judging from the above they pretty much all should be applied in the final [yolo] layer but not in the preceding ones? So mask = 0, 1, ... 6, 7 in the final layer, i.e. all except the last anchor?

And if that's right which anchors should I use in the first and second [yolo] layers?

AlexeyAB commented 4 years ago

anchors = 7, 10, 7, 14, 9, 14, 8, 18, 10, 17, 10, 20, 21, 20, 23, 28, 31, 39

No, don't use such anchors.

tozzad94 commented 4 years ago

What is the problem with those?

Should I just use the default ones instead? (Even though these are much too large for the objects in my dataset)

tozzad94 commented 4 years ago

And would there be any benefit, do you think, to increasing the size of the network beyond 608 x 608?

AlexeyAB commented 4 years ago

If you use images 640x360, then use

[net]
width=640
height=352

[yolo]
random=0

And default anchors.

Also:

tozzad94 commented 4 years ago

So I shouldn't use yolov3-spp then? What's the logic?

shouryasimha commented 4 years ago

@AlexeyAB Hello , I need your help.I'm trying to build a model to detect the anomalies in steel manafacturing so the objects in my data are quite small.As suggested, i changedd the stride =4 and layers = -1,11. Can you please help me with other configuration that i can do to optimise my output? i trained with batch = 16 and subdivisions =8 for max_batches = 12000 with height and widhth = 224*224 as my image is resolution is 200x200. yolov3_config.txt

my region 106 at the end of 12000 iterations was still nan. PLease help.