Kevinz-code / CSRA

Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"
GNU Affero General Public License v3.0
209 stars 37 forks source link

Is 'normalized' classifier necessary? #9

Open dydxdt opened 2 years ago

dydxdt commented 2 years ago

Thanks for your code. I just wonder the reason why we need to normalize the 'classifier' before the 'flatten' op. Does it perform bettern than that without normalizing? Thank you for your explanation.

dydxdt commented 2 years ago

Now I use tensoflow and try to find how to get the 'self.head.weight' in tensorflow. Or I have misunderstood and make it complex? Thank you for your reply.

Kevinz-code commented 2 years ago

Hi, thanks for reading.

  1. We use "Normalization" because it could lead to faster convergence (less training time) in our experiment, but it usually has no impact on the final performance (see Table 8 in our paper).

  2. Actually I'm not familiar with Tensorflow, you can find some instructions in its official website.

Best, Ke Zhu

dydxdt commented 2 years ago

Ok, I see. Thank you very much. Why don't you just use the normalization like batch norm or group norm? I'm not sure why you use the normalization method in your code. Can you give some explanation~~~