Open dydxdt opened 3 years ago
Now I use tensoflow and try to find how to get the 'self.head.weight' in tensorflow. Or I have misunderstood and make it complex? Thank you for your reply.
Hi, thanks for reading.
We use "Normalization" because it could lead to faster convergence (less training time) in our experiment, but it usually has no impact on the final performance (see Table 8 in our paper).
Actually I'm not familiar with Tensorflow, you can find some instructions in its official website.
Best, Ke Zhu
Ok, I see. Thank you very much. Why don't you just use the normalization like batch norm or group norm? I'm not sure why you use the normalization method in your code. Can you give some explanation~~~
Thanks for your code. I just wonder the reason why we need to normalize the 'classifier' before the 'flatten' op. Does it perform bettern than that without normalizing? Thank you for your explanation.