WongKinYiu / CrossStagePartialNetworks

Cross Stage Partial Networks
894 stars 172 forks source link

some comparison #32

Closed WongKinYiu closed 4 years ago

WongKinYiu commented 4 years ago

@amusi Hello,

I saw your article, here I provide some comparison of Pytorch version YOLOv3, YOLOv4, and YOLOv5. (All experiments are run on a same Tesla V100 GPU)

Pytorch version

Train with YOLOv3 setting (416x416)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.


yolov3-spp 43.1% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv4 setting (512x512)

trained on coco 2014 trainvalno5k set and tested on coco 2014 5k set.


yolov3-spp 43.6% AP @ 608x608
Model Summary: 152 layers, 6.29719e+07 parameters, 6.29719e+07 gradients
Speed: 6.8/1.6/8.3 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp 43.7% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 6.3/1.6/7.8 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-YOSPP-Mish: (~YOLOv4 backbone + YOLOv3 head)

cd53s-yospp-mish 44.3% AP @ 608x608
Model Summary: 184 layers, 4.89836e+07 parameters, 4.89836e+07 gradients
Speed: 7.9/1.6/9.6 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP: (~YOLOv4(Leaky))

cd53s-paspp 44.5% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 6.9/1.6/8.5 ms inference/NMS/total per 608x608 image at batch-size 16

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16


cd53s-paspp-cspt 45.1% AP @ 608x608
Model Summary: 222 layers, 5.84596e+07 parameters, 5.84596e+07 gradients
Speed: 6.6/1.5/8.1 ms inference/NMS/total per 608x608 image at batch-size 16

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.


yolov3-spp 45.5% AP @ 736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16


yolov5s 33.1% AP @ 736x736
Model Summary: 99 layers, 6.99302e+06 parameters, 6.99302e+06 gradients
Speed: 2.2/2.1/4.4 ms inference/NMS/total per 736x736 image at batch-size 16


yolov5m 41.5% AP @ 736x736
Model Summary: 165 layers, 2.51928e+07 parameters, 2.51928e+07 gradients
Speed: 5.4/1.8/7.2 ms inference/NMS/total per 736x736 image at batch-size 16


yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16


yolov5x 47.1% AP @ 736x736
Model Summary: 297 layers, 1.23102e+08 parameters, 1.23102e+08 gradients
Speed: 20.3/2.2/22.5 ms inference/NMS/total per 736x736 image at batch-size 16
AlexeyAB commented 4 years ago

@WongKinYiu Hi,

It obviously CSPDarknet53s-PASPP-Mish: (~YOLOv4) is much better than amusi YOLOv5l (640x640) (batch-size 16):

While our new YOLOv4 model is even much better:

  1. Does it use inference time data augmetation?
  2. Why is batch 16 used here?
  3. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

Train with YOLOv5 setting (640x640)

trained on coco 2017 train set and tested on coco 2017 5k set.


yolov3-spp 45.5% AP
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16
  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?
WongKinYiu commented 4 years ago


  1. Does it use inference time data augmetation?

No, there is no any inference time augmentation.

  1. Why is batch 16 used here?

I just follow Ultralytics testing protocol with batch size 16.

  1. Is there GitHub-repo with amusi YOLOv5l (640x640) ?

It is not amusi's repo, it is Ultralytics's new repo.

  1. Is better AP for Yolov3-spp achieved just by using 640x640 network resolution, or something else?

There are some modifications in Ultralytics's new repo. But yes I think main reason of improvement is from 640x640 training. And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training. So new training won't use too much GPU ram. (need to check code in details.) training log details

I am training CSPDarknet53-PACSP-(SAM)-Mish with darknet on MSCOCO 2017.

AlexeyAB commented 4 years ago

And In Ultralytics's new repo, it seems use affine transform instead of multi-resolution training.


  1. scale=0.5 https://github.com/ultralytics/yolov5/blob/391492ee5b56ef36424b4a9257c18f7c784a8f44/train.py#L44
  2. python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16

May be we should use random=0 resize=1.5 instead of random=1 too in the Darknet?

WongKinYiu commented 4 years ago


OK, will train this setting on tiny-yolov4 with width=640 and height=640. If this can work good, users can use cheaper gpu to train yolo.

WongKinYiu commented 4 years ago

@AlexeyAB Hello,

Yes, the AP is benefit by 640x640 training. CSPDarknet53s-YOSPP gets 12.5% faster model inference speed and 0.1% higher AP than YOLOv3-SPP. CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.


yolov3-spp: 45.5% AP @736x736
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
Speed: 10.4/2.1/12.6 ms inference/NMS/total per 736x736 image at batch-size 16

CSPDarknet53s-YOSPP: (~YOLOv4(Leaky) backbone + YOLOv3 head)

cd53s-yospp: 45.6% AP @736x736
Model Summary: 225 layers, 4.90092e+07 parameters, 4.90092e+07 gradients
Speed: 9.1/2.0/11.1 ms inference/NMS/total per 736x736 image at batch-size 16


yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16
AlexeyAB commented 4 years ago

@WongKinYiu Nice.

WongKinYiu commented 4 years ago


  • Does CSPDarknet53s s give improvements for training on both Ultralitics and Darknet?

I am not sure for Darknet due to I do not train it on ImageNet, but yes for Ultralytics.

  • Interesting, what AP will give P6-model that is trained on 640x640 and tested on 736x736?

To acheive this goal I have to take a look how to construct P6 model using new Ultralytics repository. Then I need construct the YOLOv4 model, it does not support all of blocks of YOLOv4 currently. (or maybe directly modify my current used pytorch code) I think I will design training scheme to train P6 model on Darknet first.

AlexeyAB commented 4 years ago

@WongKinYiu Hi,

Can you share cfg/weights files for this model?

CSPDarknet53s-PASPP-Mish: (~YOLOv4) - trained 512x512, tested 608x608

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16
WongKinYiu commented 4 years ago


cd53s-paspp-mish.cfg cd53s-paspp-mish.pt

clw5180 commented 4 years ago

Hi WongKinYiu, what does -PACSP mean ? And I can't find config and weight file of it, thanks a lot !

WongKinYiu commented 4 years ago

Hello, PACSP means apply CSP to PANet, the model is still in training process, will release .weights file after finish training.