Open Barry-Chen-yup opened 2 years ago
P5 models output P3, P4, and P5 prediction. P6 model output P3, P4, P5, and P6 prediction.
train.py is used to train Detect, IDetect, IBin heads. train_aux.py is used to train IAuxDetect head.
And I found hyp.scratch.p6.yaml is the same as hyp.scratch.p5.yaml
P5 models output P3, P4, and P5 prediction. P6 model output P3, P4, P5, and P6 prediction.
train.py is used to train Detect, IDetect, IBin heads. train_aux.py is used to train IAuxDetect head.
Can you please tell me what is P3, P4, P5 and P6? and where can I read more about such terms from the context of yolov7?
train.py for img-size 640 train_aux.py for img-size 1280
Is the image size really the only difference? Because I can run train.py with 1280 no problem and it works.
Is the image size really the only difference? Because I can run train.py with 1280 no problem and it works.
I don't think so I was looking out for the same I found the following:
One stage detector like YOLO have the following stages
All object detectors take an image in for input and compress features down through a convolutional neural network backbone. In image classification, these backbones are the end of the network and prediction can be made off of them. In object detection, multiple bounding boxes need to be drawn around images along with classification, so the feature layers of the convolutional backbone need to be mixed and held up in light of one another. The combination of backbone feature layers happens in the neck. It is also useful to split object detectors into two categories: one-stage detectors and two stage detectors. Detection happens in the head. Two-stage detectors decouple the task of object localization and classification for each bounding box. One-stage detectors make the predictions for object localization and classification at the same time. YOLO is a one-stage detector, hence, You Only Look Once.
let's take a look at Neck
e.g. BiFPN
it is extracting features from different feature layers and those layers are marked as P4, P5, P6, etc. so layers like P6 are responsible for extracting features in smaller areas of the image while layer like P4 is responsible for extracting features in larger areas
ref: https://blog.roboflow.com/a-thorough-breakdown-of-yolov4/ https://towardsdatascience.com/review-fpn-feature-pyramid-network-object-detection-262fc7482610
Please correct me If I am wrong :)
@pathikg , that make sense to me. If you look at cfg\training\yolov7.yaml
you can see that several layers are marked as P1, P2, P3, P4, and P5. It must be the features it is extracting from those layers. Same for other config files and for example yolov7-w6.yaml
has one feature marked as P6.
Funnily enough I got the best result when I was trining using train.py
but by accident put yolov7-w6_training.pt
weights to start with. Not sure what happened there...
This is the backbone of yolov7 basic
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [32, 3, 1]], # 0
[-1, 1, Conv, [64, 3, 2]], # 1-P1/2
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [128, 3, 2]], # 3-P2/4
[-1, 1, Conv, [64, 1, 1]],
[-2, 1, Conv, [64, 1, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[-1, 1, Conv, [64, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [256, 1, 1]], # 11
[-1, 1, MP, []],
[-1, 1, Conv, [128, 1, 1]],
[-3, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 16-P3/8
[-1, 1, Conv, [128, 1, 1]],
[-2, 1, Conv, [128, 1, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[-1, 1, Conv, [128, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [512, 1, 1]], # 24
[-1, 1, MP, []],
[-1, 1, Conv, [256, 1, 1]],
[-3, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 29-P4/16
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 37
[-1, 1, MP, []],
[-1, 1, Conv, [512, 1, 1]],
[-3, 1, Conv, [512, 1, 1]],
[-1, 1, Conv, [512, 3, 2]],
[[-1, -3], 1, Concat, [1]], # 42-P5/32
[-1, 1, Conv, [256, 1, 1]],
[-2, 1, Conv, [256, 1, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[-1, 1, Conv, [256, 3, 1]],
[[-1, -3, -5, -6], 1, Concat, [1]],
[-1, 1, Conv, [1024, 1, 1]], # 50
]
Train code was seperated to train.py and train_auc.py. I do not know what different and how to use.