WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
https://github.com/WongKinYiu/CrossStagePartialNetworks
894 stars 172 forks source link

different result between repos and preprint paper #20

Closed ntdat017 closed 4 years ago

ntdat017 commented 4 years ago

In the preprint paper, I saw the difference result when test in the MSCOCO object detection task. For instance, with size 608x608 CSPResNeXt50-PANet-SPP had AP is 43.2 in the repos comparing with 38.4 in the paper. It's a big gapped. Could you explain me clearly? Capture from ReadMe.md: Screenshot from 2020-03-18 14-39-34 From preprint paper: Screenshot from 2020-03-18 14-39-48

One more question, the input size of csresnext50-panet-spp-original-optimal.cfg is 512, the last yolo output sequently is 64,32,16 so when you getting optimal model on input size 608 it will 76,38,19, does you need modify model to get the best result?

WongKinYiu commented 4 years ago

@ntdat017

The results of 43.2% AP is after combining with CIoU, Scale Sensitivity, IoU Threshold, Greedy NMS, Mosaic Augmentation, ...

image

The results in preprint paper only change backbone and head of YOLOv3-SPP to CSPResNeXt50 and PANet. Details are in https://github.com/WongKinYiu/CrossStagePartialNetworks#ms-coco Also you can see how AP becomes 42% from 38% in https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/coco/results.md#mscoco image

Although the outputs are different, the FoV of each grid is same: 512/64 = 608/76 = 8. If the size of objects which you want to detect is same in resized images, you will get same results. But if optimize model on input size 608, you still can get better results, the main reason is that the objects in larger input images are also larger and the large object is easier to be detected than the small object.

ntdat017 commented 4 years ago

@WongKinYiu I got it. Thank you for the explanation clearly. That's amazing some technique like CIoU, Scale Sensitivity,.. can improve the AP a lot.

ntdat017 commented 4 years ago

@WongKinYiu Does you trying use the attention block in the detection model like Squeeze-and-excitation network and how if we use it the detection network? I saw that you had trying with SE in classifier task but it's don't work, right?

Screenshot from 2020-03-18 17-33-05

WongKinYiu commented 4 years ago

It also works in detector. However, although SE blocks only increase 1% computation, it will increase about 20% inference time on GPU. So I do not train detector with SE block, I use SAM block instead.