WongKinYiu / yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
GNU General Public License v3.0
8.85k stars 1.38k forks source link

where is auxiliary information #196

Open ccblublu opened 6 months ago

ccblublu commented 6 months ago

thanks for your excellent job, I still have a question about multi-level auxiliary information in your paper 4.1.2 however i cannot find where it used in your code, i just find 6 heads in main branch and aux branch as shown in .yaml

ccblublu commented 6 months ago

@WongKinYiu

WongKinYiu commented 6 months ago

https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-c.yaml#L81-L116 https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-e.yaml#L88-L105

ccblublu commented 6 months ago

thanks for your reply, in my opinion, it is aux branch,revealed by the grey area in fig,where is the pink area? @WongKinYiu PixPin_2024-03-08_09-50-28

sanha9999 commented 6 months ago

https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-e.yaml#L88-L105

"multi-level auxiliary branch" is pink area.

ccblublu commented 6 months ago

there is 3+3+3, 9 heads in fig, but just 6 heads in yaml, which make me confused, is there any details i missed? and as shown in yolov9-e.yaml, the relationship between main branch and aux branch is different with fig in paper,Looking forward to your relpy! @WongKinYiu @sanha9999 4e94bb69ec3372213726b0f69a370cde

YoohLee commented 6 months ago

I have this question too. I already have found the multi-level auxiliary branch in yaml files, but, I have not find the Auxiliary Reversible Branch. And, there are nine prediction heads in the paper's figure 3, but only 6 in the yaml files. Why is this?

wgqhandsome commented 6 months ago

there is 3+3+3, 9 heads in fig, but just 6 heads in yaml, which make me confused, is there any details i missed? and as shown in yolov9-e.yaml, the relationship between main branch and aux branch is different with fig in paper,Looking forward to your relpy! @WongKinYiu @sanha9999 4e94bb69ec3372213726b0f69a370cde

I have the same question. There are only six heads in the model in the yaml file of yolo9-e. I am confused. If anyone can understand, please tell me.

WongKinYiu commented 6 months ago

You could take a look Table 4, if you have aux branches on both backbone and neck, you will have 3+3+3 heads. And you could use train_triple.py to train the model.

ccblublu commented 6 months ago

If I want to have 9 heads, I need to select the corresponding YAML file. Based on the open-source code, I need to add the corresponding layers as inputs for the prediction head. From my observation, the main differences between train_*.py s are in the loss function and the selection of the detection head. Am I correct in my understanding? @WongKinYiu

ccblublu commented 6 months ago

I have this question too. I already have found the multi-level auxiliary branch in yaml files, but, I have not find the Auxiliary Reversible Branch. And, there are nine prediction heads in the paper's figure 3, but only 6 in the yaml files. Why is this?

i think the Reversible structure is multi-level connections and fuse

masc-it commented 6 months ago

@WongKinYiu Could we have a diagram or explanation of what an auxiliary branch does in terms of operations/transformations, maybe some links to the code as well?

It's clear that it serves the purpose of preserving the input-target relation throughout the layers, in a parallel branch, but said like that is kinda black magic to me.

The same applies to the multi level auxiliary branch and predictions heads: how they are plugged in the rest of the network?

JaneM1222 commented 6 months ago

Which part of the source code specifically implements the operation of the reversible auxiliary branch? There are still doubts about the implementation of this part of the function. Can you explain the specific code? @WongKinYiu

WongKinYiu commented 6 months ago

there is 3+3+3, 9 heads in fig, but just 6 heads in yaml, which make me confused, is there any details i missed? and as shown in yolov9-e.yaml, the relationship between main branch and aux branch is different with fig in paper,Looking forward to your relpy!

image

The main branch of yolov9-e contains reversible architecture. The architecture of yolov9-e is modify from dynamic-yolov7.

WongKinYiu commented 6 months ago

Which part of the source code specifically implements the operation of the reversible auxiliary branch? There are still doubts about the implementation of this part of the function. Can you explain the specific code?

https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-c.yaml#L81-L116

masc-it commented 6 months ago

A lack of proper explanation and understanding won't let this model have the attention it deserves. Please, consider writing a technical post or something where you explain in a more detailed view where, how and why things are placed in a certain way and why they work. Pointing to a Yaml it s of no help.

Also, the yolov5 codebase you're based on builds the model arch at runtime and that doesn't help to visualize the flow. A class with all the layers would help as well.

I'm a scientist like you, but please be more careful to code quality and proper documentation. That stuff makes the difference. kudos for your next works

WongKinYiu commented 6 months ago
  1. Original deep networks will lost information when feed forward.

image

  1. Especially loss the information for making projection from data to target.

image

  1. We can visualize that the information lost make we can not find correct relation to project data to target.

image

  1. And modern networks could maintain more reliable information.

image

  1. Theoretically, reversible architecture could make deeper networks to maintain most of information of data.

image

  1. We also show the evidence that reversible branch make more trustworthy relation between data and target.

image

  1. And the corresponding part of reversible architecture is here.

https://github.com/WongKinYiu/yolov9/blob/main/models/detect/yolov9-c.yaml#L81-L116

masc-it commented 6 months ago

thanks, I've a couple of questions:

  1. How did you find that gelan actually satisfied the "reversible" property? Just by looking at feature maps? What's behind elan in general that makes it better than CSP and alikes ?
  2. What's the role of CBLinear from CBNet?
  3. Have you done ablations of some sort?
WongKinYiu commented 6 months ago

no, gelan does not satisfied the reversible property, but cbnet and revcol do. for analysis of elan and csp, please take a look elan paper.

cblinear is just a set of linear layers to make different pyramidal feature maps to have same channel. the main part which contains reversible property is cbfuse which composite higher level features of the first backbone into lower level of features of the second backbone.

we have examine different sorts composite ways in cbnet, the best performance is conduct by fully dhlc and fcc. however, because we may add reversible auxiliary branch on different position of gelan-pan architecture. for fair comparison, we only make dhlc composition on p3/p4/p5 levels. the related ablations are shown in table 4.

Muyyong commented 5 months ago

no, gelan does not satisfied the reversible property, but cbnet and revcol do. for analysis of elan and csp, please take a look elan paper.

cblinear is just a set of linear layers to make different pyramidal feature maps to have same channel. the main part which contains reversible property is cbfuse which composite higher level features of the first backbone into lower level of features of the second backbone.

we have examine different sorts composite ways in cbnet, the best performance is conduct by fully dhlc and fcc. however, because we may add reversible auxiliary branch on different position of gelan-pan architecture. for fair comparison, we only make dhlc composition on p3/p4/p5 levels. the related ablations are shown in table 4.

Can I interpret the nearest sample and Add operation in CBFuse as a reversible structure ? or does it mean that the structure of fusing high-level information of the first backbone into the second backbone is reversible ? In other words, if replacing CBFuse with Concat still a reversible structure?

WongKinYiu commented 5 months ago

Concat is OK

discipleofhamilton commented 4 months ago

there is 3+3+3, 9 heads in fig, but just 6 heads in yaml, which make me confused, is there any details i missed? and as shown in yolov9-e.yaml, the relationship between main branch and aux branch is different with fig in paper,Looking forward to your relpy!

image

The main branch of yolov9-e contains reversible architecture. The architecture of yolov9-e is modify from dynamic-yolov7.

Hi @WongKinYiu , I do have some questions about the PGI mechanism. There are 2 differences between Figure 4 in the paper and the figure above:

  1. The forward path is different. In the yaml, the input number of the paths from the same level would be only one but two in the paper. Why?
  2. In my understanding, yolov9-e.yaml contains the auxiliary reversible branch. Why doesn't the branch be removed in the reparameterization.ipynb, or why is the branch regarded as a part of the main branch? If it's an auxiliary branch, it should be removed when deployed.
Muyyong commented 4 months ago

@WongKinYiu ,Why is it that I further processing the information in res in the CBFuse,
res = [F.interpolate(x[self.idx[i]], size=target_size, mode='nearest') for i, x in enumerate(xs[:-1])] and then into the second backbone, out = torch.sum(torch.stack(res + xs[-1:]), dim=0) the results will change each time , as if it destroys the reversibility.But I noticed that there is also nearest sample operation in it that has no effect on reversibility?

Muyyong commented 4 months ago

@WongKinYiu ,Why is it that I further processing the information in res in the CBFuse, res = [F.interpolate(x[self.idx[i]], size=target_size, mode='nearest') for i, x in enumerate(xs[:-1])] and then into the second backbone, out = torch.sum(torch.stack(res + xs[-1:]), dim=0) the results will change each time , as if it destroys the reversibility.But I noticed that there is also nearest sample operation in it that has no effect on reversibility?

After changing the structure of this part, the result is different every time.