Open SergeiSamuilov opened 6 years ago
Try to comment these 3 lines:
And train from the begining at least 6000 iterations
Then check mAP
Thank you for your advice. Btw, regular yolov3-tiny showed great perfomance. Before I started training xnor-net model with new settings, could you please answer some more questions about dataset:
1) Should I include negative samples in validation set as well. I have 11k face images and the same amount of true negative images (background), but valid set consists of only 3k true positive face images.
2) I limited maximum number of faces in one image to 20. Should I use parameter max=200 stated in tutorial, if it's applicable to tiny model at all.
@SergeiSamuilov
Should I include negative samples in validation set as well. I have 11k face images and the same amount of true negative images (background), but valid set consists of only 3k true positive face images.
As you want.
I limited maximum number of faces in one image to 20. Should I use parameter max=200 stated in tutorial, if it's applicable to tiny model at all.
What do you mean? All objects on the image should be labeled.
You should set max=200
only if there are more than 90
objects on the Training image: https://github.com/AlexeyAB/darknet/blob/2c5e383c04655fe45f3f533eb3a69a80acbf3561/src/parser.c#L278
Thank you again, Alexey. I've trained the model using the proposed settings, but still get low mAP and high loss (mAP 12%, avg loss 3,5, trained for >10k iter.). Assuming that such poor results could be caused by inappropriate training material, I have tried different datasets (IMDB face dataset, WIDERface) and used different max number of faces in one image (max = 1 face, 20 faces, 90 faces) , but still got similar results.
I limited maximum number of faces in one image to 20. Should I use parameter max=200 stated in tutorial, if it's applicable to tiny model at all.
What do you mean? All objects on the image should be labeled. You should set
max=200
only if there are more than90
objects on the Training image:
I manually handpicked all the images meeting the criteria, parsing annotation files. Of course, all the objects were labeled, I just didn't use the images with more objects.
Are there any more ways to improve performance of xnor-model based on fine-tuning or enhancing dataset? And sorry for being importunate, if there's nothing I can do with xnor model, I'll just stick to the default yolov3-tiny, which works great.
@SergeiSamuilov Hi,
Try to train this model: yolov3-tiny_fp32_xnor.cfg.txt
Also set random=1
in the both [yolo]
layers.
I'v got mAP = 87.59 % by using yolov3-tiny_fp32_xnor.cfg
after 10 000 iterations on my own dataset, while I'v got mAP = 90.77 % by using common yolov3-tiny.cfg
.
Command for training:
darknet.exe detector train data/obj.data yolov3-tiny_fp32_xnor.cfg yolov3-tiny.conv.15
But I have small number of small objects.
yolov3-tiny_fp32_xnor.cfg
- avg loss 0.42yolov3-tiny.cfg
- avg loss 0.26@AlexeyAB I check the yolov3-tiny_fp32_xnor.cfg.txt above you upload, in this file, there are 6 location to commont the #xnor=1, base on instruction, I understand the first 5 location to commont expect the last commont one on line 159, in your way, commont the xnor=1 before the last yolo detection will get high mAP, the last commont line should be in the line 174, but not line 159, is this a wrong line commont bug?
I trained tiny yolov3 model on one class (face) based on this cfg-file: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_xnor.cfg As instructed, I used command to get initial weights for training: darknet.exe partial yolov3-tiny-xnor-obj.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15. While inferencing trained model, I've had very few detections and low mAP:
detections_count = 26375, unique_truth_count = 968 class_id = 0, name = face, ap = 27.35 % for thresh = 0.25, precision = 0.78, recall = 0.14, F1-score = 0.24 for thresh = 0.25, TP = 134, FP = 37, FN = 834, average IoU = 56.16 % mean average precision (mAP) = 0.273530, or 27.35 %
Model was trained for 2400 iterations (avg loss ~ 1.4, wasn't changing since 2000 iterations), dataset included 3200 train images and 850 val images, also I generated 6 anchors by calc_anchors.
Could you please clarify the concept of training xnor-net model and explain my steps in improving this model's results.