Unmatch in architecture between the paper and the pretrained model

I realized that the pretrained network's architecture is a little bit shorter than one proposed in the paper. For instance, there are 4 C3 blocks before reaching to the SPP block in the paper, but in the pretrained model, there are only 3. Could you please explain this to me? Or am i wrong? I just simply invoke "print(model)" on the notebook and manually reformulate the architecture. Is anything not correct with the print command on Pytorch? Or the model is indeed missing some parts compare to the one in the your paper? Update: I just did a little check. Only with P6 version, your pretrained model match up within the paper. It doesnt make sense. Because in your paper, you didnt mention about changing the backbone when removing P6 output. It assumes that P6 block is just a removable head, and the backbone, including the neck is still the same. Is this a conventional manner of saying "removing the P6 output" in Computer Vision context?

Another two questions! Fig 1e in paper demonstrates the C3 block that has a CONV layer between bottleneck layers and CONCAT layer, but they seemly doesnt appear in the code (actually bottlenetCSP module have that, but C3 module is not, *in models/common.py). Second, the depth multiple is (1.0,0.67,0.33) respectively in code, but (1.0,0.5,0.33) in paper. Typing mistake?

deepcam-cn / yolov5-face

Unmatch in architecture between the paper and the pretrained model #97