WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.03k stars 4.12k forks source link

CSPVovnet #935

Open andrearosasco opened 1 year ago

andrearosasco commented 1 year ago

Hello, in the paper, when referring to CSPVoVNet, the YOLOv4 paper is cited. I read it quickly and it seems that there is no mention of CSPVoVNet whatsoever. Is the citation wrong or am I missing something?

I'm asking because I was particularly interested in this concept of analyzing the gradient path associated with CSPVoVNet.

Thanks

dilaraozdemir commented 1 year ago

@andrearosasco I think the CSP approach seems to have been described in the given reference. However, the authors described CSPVoVNet as a variation of VovNet in the original article. Looks like a combine (CSP approach and VovNet) has been made here.

andrearosasco commented 1 year ago

Hi @dilaraozdemir , thanks. Yes, it would seem something like that. In YOLOv4 they talk about CSP as they use it in their backbone architecture (CSPDarknet). Still, I'm confused by the reference to the YOLOv4's paper regarding CSPVoVNet architecture, I wanted to read more about it.

Also, one interesting thing is that CSPNet split the input of a block channel-wise and concatenates it with the final feature map of the same block. In the case of CSPVoVNet instead, the cross-stage connection of CSPVoVNet spans two blocks. Or at least that's what Figure. 2 seems to show.

andrearosasco commented 1 year ago

I found this issue https://github.com/WongKinYiu/yolov7/issues/235 where @WongKinYiu refers to the architecture from yolov4 (or at least to the one of tiny-yolov4) as a CSPVoVNet. I'm quite confused. Maybe CSPDarknet and CSPVoVNet are the same architecture.

dilaraozdemir commented 1 year ago

Right now I'm confused too. I guess we have to look at their mathematical functions to understand whether CSPDarknet and CSPVoVNet are the same architecture. Looks like we should do some more research.

andrearosasco commented 1 year ago

Yeah, or using some visualization tools like Netron on the two models. I've also tried to visualize yolov7-tiny, which should be an E-ELAN (a concept about which we know frighteningly little), but I couldn't understand much. In particular, this max-pool part looks interesting yolov7-tiny-maxpool Anyway, yolov7 is still a pre-print so I'm guessing some details will be clarified in a future camera-ready version.

WongKinYiu commented 1 year ago

Hello, VoVNet is composed by OSA modules. Due to there are only 1 OSA module in each stage of YOLOv4-tiny, we called it CSPOSANet in Scaled-YOLOv4 paper. In Figure 2 of YOLOv7 paper, there are several OSA modules stacked in a stage, so we call it CSPVoVNet.

dilaraozdemir commented 1 year ago

Yeah, or using some visualization tools like Netron on the two models. I've also tried to visualize yolov7-tiny, which should be an E-ELAN (a concept about which we know frighteningly little), but I couldn't understand much. In particular, this max-pool part looks interesting yolov7-tiny-maxpool Anyway, yolov7 is still a pre-print so I'm guessing some details will be clarified in a future camera-ready version.

There is another question on my mind, how would you interpret this max pooling process?

andrearosasco commented 1 year ago

@WongKinYiu thanks a lot, it really clears things up! @dilaraozdemir maybe it could be a way of increasing the receptive field while avoiding throwing away information. The thing is this particular structure appears just once in the architecture so it's not a part of the stage building block let's say. Anyway, I guess many details will be clearer when the ELAN paper will be published

dilaraozdemir commented 1 year ago

@andrearosasco thanks a lot for your reply! The information was pretty good, now I can clearly see the process.