Trami1995 / YOLOv10

YOLOv10 implement with mmyolo
GNU General Public License v3.0
32 stars 2 forks source link

Model structure appears to differ from the original code, as the original yolov10n/s/m/b/l/x.yaml structures are not simple scaling but involve module changes, primarily switching between C2fCIB and C2f. #4

Closed CFZ1 closed 4 weeks ago

CFZ1 commented 1 month ago

I greatly appreciate the authors' work on this project, which has been incredibly helpful. However, I've noticed some discrepancies between the model structure in this implementation and the original YOLOv10 code (https://github.com/THU-MIG/yolov10).

After comparing the structures for YOLOv10m, I observed the following differences: (1) Layer 19 in the neck uses a different module compared to the original code (C2f vs C2fCIB). (2) The implementation of one2many_cls_preds appears to differ from the original version.

yolov10s:

head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, C2f, [512]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 3, C2f, [256]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 3, C2f, [512]] # 19 (P4/16-medium)

  - [-1, 1, SCDown, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 3, C2fCIB, [1024, True, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, v10Detect, [nc]] # Detect(P3, P4, P5)

yolov10m:

head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, C2f, [512]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 3, C2f, [256]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 3, C2fCIB, [512, True]] # 19 (P4/16-medium)

  - [-1, 1, SCDown, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 3, C2fCIB, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, v10Detect, [nc]] # Detect(P3, P4, P5)

Notable differences between the original YOLOv10 models (n/s/m/b/l/x) include variations in: (1) Layers 13 and 19 in the neck, (2) Layer 8 of the backbone.

To facilitate this comparison, I used the following code to print the model structure of the original implementation:

from ultralytics.nn.tasks import attempt_load_one_weight
src = './yolov10m.pt'
yolov8_model, ckpt = attempt_load_one_weight(src)
blobs = yolov8_model.state_dict()
with open('./yolov10.py', 'w') as f:
    f.write(str(yolov8_model))

If I've misunderstood something, please correct me. Thank you for your attention to this matter.

Trami1995 commented 4 weeks ago

Thank for advice, after checking the code, the implementation of yolov10[inclued the backbone and the neck] is indeed different from the original paper. And we will fix the bug soon and release the pretrained model. And the head implementation is refer the code

CFZ1 commented 4 weeks ago

Thank you for your feedback. We look forward to seeing the updated implementation and wish you success in your work. :smile: 🎉