WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
12.99k stars 4.11k forks source link

Yolov7 Architecture in Research Paper #762

Open thilinawee opened 1 year ago

thilinawee commented 1 year ago

Hi, When I went through the research paper of yolov7, I came across this diagram which describes the architectural differences between some other networks and yolov7 itself. image

I wonder if anyone can explain to me the meanings of these notations. For example, what does 3x3, 2c, 2c, 2 stand for? I understand that 3x3 means a kernel_size = 3,3. However, I am not sure about the rest. Thanks ~Thilina

WongKinYiu commented 1 year ago

kernel size, input channel, output channel(, group).

thilinawee commented 1 year ago

Thanks, @WongKinYiu for the quick reply. Does group mean the number of groups in the convolution(group convolutions) or the number of such sequential blocks?

WongKinYiu commented 1 year ago

They are equivalent, you could take a look ResNeXt paper. image

thilinawee commented 1 year ago

I sketched the yolov7 architecture according to what I understood. Could you please point out where is E-ELAN block with group convolutions? Are the group convolutions used only for training? Thanks in advance.

Overall Architecture

image

ELAN

image

CSPSPP

image

B3

image

B2

image

RomanczuG commented 1 year ago

Have you received any feedback on your draft?

thilinawee commented 1 year ago

@RomanczuG No, I haven't :(

WongKinYiu commented 1 year ago

You could take a look page 16 of the paper.

PascAlex commented 1 year ago

So you implemented E-Elan just in yolov7-E6E?

Firdaus909 commented 1 year ago

@thilinawee Hello, i'm new to yolo algo. Can you explain more about RepConv layer?