eriklindernoren / PyTorch-YOLOv3

Minimal PyTorch implementation of YOLOv3
GNU General Public License v3.0
7.29k stars 2.64k forks source link

mAP nearly zero? #841

Open Chenyaoyi1998 opened 11 months ago

Chenyaoyi1998 commented 11 months ago

What I'm trying to do

I tried to train on the VOC2007 dataset. However, I failed. After many epochs, the mAP is zero.

What I've tried

  1. Loaded with DarkNet weights——weights/darknet53.conv.74
  2. Converting VOC datasets to COCO: filename.xml -> filename.txt x = (x1 + x2) / 2 y = (y1 + y2) / 2 w = x2 - x1 h = y2 - y1 x , y, w, h = x/W, y/H, w/W, h/H
  3. modified the yolov3.cfg: Use the default hyperparameters Use the anchor by kmeans Change the class from 80 to 20 Changing the feature map dimension from 3(5+80) to 3(5+20)

Additional context

I tried to modify the learning rate from 0.0001 to 0.001, did not work. I tried to use sgd but not adam, did not work.

Chenyaoyi1998 commented 11 months ago

log after 10 epech: +-------+-------------+---------+ | Index | Class | AP | +-------+-------------+---------+ | 0 | aeroplane | 0.00000 | | 1 | bicycle | 0.00000 | | 2 | bird | 0.00000 | | 3 | boat | 0.00000 | | 4 | bottle | 0.00000 | | 5 | bus | 0.00000 | | 6 | car | 0.00000 | | 7 | cat | 0.00000 | | 8 | chair | 0.00000 | | 9 | cow | 0.00000 | | 10 | diningtable | 0.00000 | | 11 | dog | 0.00000 | | 12 | horse | 0.00000 | | 13 | motorbike | 0.00000 | | 14 | person | 0.00000 | | 15 | pottedplant | 0.00000 | | 16 | sheep | 0.00000 | | 17 | sofa | 0.00000 | | 18 | train | 0.00000 | | 19 | tvmonitor | 0.00000 | +-------+-------------+---------+ ---- mAP 0.00000 ----

Chenyaoyi1998 commented 11 months ago

image

Chenyaoyi1998 commented 11 months ago

I think I found a possible reason.

I read the function ‘compute_loss’ carefully, which describes the loss as: Find the anchor that ground truth is responsible for in the three feature maps, and the bbox predicted by these responsible anchors are the positive examples. The authors consider the rest of the prediction bbox to be negative examples. Positive examples produce three-part losses for classification, confidence, and bbox regression. Negative examples produce only a loss of confidence. In calculating the loss of confidence, the positive examples are labelled with IoU of ground truth, while the negative examples are labelled with 0.

The problem is that there is a great imbalance between the positive and negative examples, and the calculation of the loss function does not seem to be consistent with what seems to be described in the paper. During my training, the network tends to output boxes with confidence 0, and the confidence of the prediction quickly turns negative after a few batches (it goes to 0 later after sigmod).

I'm a newcomer to the field of target detection. The above is just my personal understanding, there may be deviations in the understanding of the details of calculating the loss function in the code or the paper, welcome to discuss.