Open tozzad94 opened 4 years ago
You should able to get ~99% mAP
As suggested by AlexeyAB on the darknet Github page: "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11, and stride=4 on lines 717 and 720 in the cfg. file - I am not experienced enough to know why this is good advice (guessing something to do with preserving some of the lower-level features)
Yes, do this
Train with width=608 height=608
Do you use yolov3-spp.cfg
?
And add more images to your training dataset: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more
Thanks a lot for the advice, and for the speed of response.
Some more questions:
Cheers
Ah, and anchors as well, should the output of the calc_anchors function suffice? Is nine clusters sufficient/too many?
What makes 608 x 608 the optimal network size for this situation? Should I set random = 0 to guarantee this size across all iterations? Or fine to leave random = 1?
In your case better to train random=0 and set lower subdivisions=
yolov3-spp has +3% accuracy compared to yolov3
• Does the darknet implementation automatically augment the dataset with flips/crops and so on? Or should I prepare these in advance of training?
Yes, due to [net] flip=1
by default.
And crop [yolo] jitter=0.3
Read: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
Only if you are an expert in neural detection networks - recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*
before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
Thanks a lot.
So the calc_anchors yielded this on my dataset (having changed width and height to 608 x 608, and using 9 as num-clusters):
anchors = 7, 10, 7, 14, 9, 14, 8, 18, 10, 17, 10, 20, 21, 20, 23, 28, 31, 39
Which of these anchors should I use in each of the three [yolo] layers? Judging from the above they pretty much all should be applied in the final [yolo] layer but not in the preceding ones? So mask = 0, 1, ... 6, 7 in the final layer, i.e. all except the last anchor?
And if that's right which anchors should I use in the first and second [yolo] layers?
anchors = 7, 10, 7, 14, 9, 14, 8, 18, 10, 17, 10, 20, 21, 20, 23, 28, 31, 39
No, don't use such anchors.
What is the problem with those?
Should I just use the default ones instead? (Even though these are much too large for the objects in my dataset)
And would there be any benefit, do you think, to increasing the size of the network beyond 608 x 608?
If you use images 640x360, then use
[net]
width=640
height=352
[yolo]
random=0
And default anchors.
Also:
either use: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3_5l.cfg
or use https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3.cfg and do
As suggested by AlexeyAB on the darknet Github page: "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11, and stride=4 on lines 717 and 720 in the cfg. file - I am not experienced enough to know why this is good advice (guessing something to do with preserving some of the lower-level features)
So I shouldn't use yolov3-spp then? What's the logic?
@AlexeyAB Hello , I need your help.I'm trying to build a model to detect the anomalies in steel manafacturing so the objects in my data are quite small.As suggested, i changedd the stride =4 and layers = -1,11. Can you please help me with other configuration that i can do to optimise my output? i trained with batch = 16 and subdivisions =8 for max_batches = 12000 with height and widhth = 224*224 as my image is resolution is 200x200. yolov3_config.txt
my region 106 at the end of 12000 iterations was still nan. PLease help.
I am trying to develop a model to find the position of the balls on a snooker table from a "behind" (rather than above) camera shot and to identify the colour of each ball.
I am using AlexeyAB's darknet implementation of yolov3 but the results are a bit short of what I was hoping for (achieves a MaP of around 70% on a validation set). Anecdotally, the predictions really suffer when the balls are positioned in clusters, which happens quite frequently in snooker! I'm using 640 x 360 pixel images and the training set has about 300 examples of each ball class, and considerably more examples of reds, as a game of snooker begins with fifteen reds on the table and only one of each other colour.
Here is a sample prediction, to give you an idea...
https://imgur.com/a/MAooGl9
Anyway, I was wondering how I might best adapt the .cfg file for this task, where the objects are invariably fairly small (12 x 12 pixels or so, depending on how close the ball is to the near/camera side of the table), and where the RGB values are so crucial to the classification phase (this is why I've suppressed the hue/saturation/exposure parameters).
A few things I was thinking of doing (advice most welcome):
Increasing the size of the network from 416 x 416 and retraining - I guess this would give the model a better chance of picking up on the balls which are partly obscured by neighbouring balls?
Increasing the number of anchors from 9 and retraining (I have already used the calc_anchors function to generate a supposedly optimal set of bounding box widths and heights)
As suggested by AlexeyAB on the darknet Github page: "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11, and stride=4 on lines 717 and 720 in the cfg. file - I am not experienced enough to know why this is good advice (guessing something to do with preserving some of the lower-level features)
A bit of a kop-out but I could train the model on fewer classes, e.g. collapsing all the individual ball classes (red, black, pink, etc.,) into a single "ball" class, and then perform the classification in a second phase in opencv, say by mapping the RGB values in the bounding boxes proposed by yolov3 to the most likely snooker ball colour.
There is also in the back of mind the idea that using yolov3 may not be the most effective way of tackling the problem to begin with and other deep learning or even more standard methods might be better suited. For instance, there is a thesis written about a seemingly successful approach using the opencv library but annoyingly the entire thing (other than the abstract) is written in Slovenian (http://eprints.fri.uni-lj.si/2590/).