Closed SilvioGiancola closed 3 years ago
Hi,
By default, the batch_norm is activated which mean that the following line is activated: https://github.com/antoine77340/Mixture-of-Embedding-Experts/blob/a53979fcdeb3a7a1c59f5fe91d30cc6cd6d53519/loupe.py#L43
The batch norm is already containing the bias so it would be redundant to add one after the matmul.
Does it answers your question?
I see, yes, it does answer my question, thanks!
What is the motivation for the batch normalization? Overfitting? Have you experienced similar performances without the batch norm, hence without that biases?
I am using NetVLAD in one of my project and experienced similar to better performances without this bias. I am not using any batch norm, so I was looking for any insight on that matter.
Actually I did not play with this model a long time ago, but from what I remember, this batch norm with bias had minor impact on the final results.
Gotcha, thank you for your insights!
Hi @antoine77340 ,
I couldn't help notice that in your implementation of NetVLAD, you dropped the biases for the conv layer and only consider the multiplication with the weights, especially on that line: https://github.com/antoine77340/Mixture-of-Embedding-Experts/blob/a53979fcdeb3a7a1c59f5fe91d30cc6cd6d53519/loupe.py#L41
The original NetVLAD paper considers learning the weights and biases of the conv layer, as per Equation (3) here: https://openaccess.thecvf.com/content_cvpr_2016/papers/Arandjelovic_NetVLAD_CNN_Architecture_CVPR_2016_paper.pdf
Do you have any rationale on why considering only the multiplication and not the biases?
Any insight would be welcome.
Thanks!