antoine77340 / Mixture-of-Embedding-Experts

Mixture-of-Embeddings-Experts
Apache License 2.0
117 stars 15 forks source link

Question on NetVLAD implementation #15

Closed SilvioGiancola closed 3 years ago

SilvioGiancola commented 3 years ago

Hi @antoine77340 ,

I couldn't help notice that in your implementation of NetVLAD, you dropped the biases for the conv layer and only consider the multiplication with the weights, especially on that line: https://github.com/antoine77340/Mixture-of-Embedding-Experts/blob/a53979fcdeb3a7a1c59f5fe91d30cc6cd6d53519/loupe.py#L41

The original NetVLAD paper considers learning the weights and biases of the conv layer, as per Equation (3) here: https://openaccess.thecvf.com/content_cvpr_2016/papers/Arandjelovic_NetVLAD_CNN_Architecture_CVPR_2016_paper.pdf

Do you have any rationale on why considering only the multiplication and not the biases?

Any insight would be welcome.

Thanks!

antoine77340 commented 3 years ago

Hi,

By default, the batch_norm is activated which mean that the following line is activated: https://github.com/antoine77340/Mixture-of-Embedding-Experts/blob/a53979fcdeb3a7a1c59f5fe91d30cc6cd6d53519/loupe.py#L43

The batch norm is already containing the bias so it would be redundant to add one after the matmul.

Does it answers your question?

SilvioGiancola commented 3 years ago

I see, yes, it does answer my question, thanks!

What is the motivation for the batch normalization? Overfitting? Have you experienced similar performances without the batch norm, hence without that biases?

I am using NetVLAD in one of my project and experienced similar to better performances without this bias. I am not using any batch norm, so I was looking for any insight on that matter.

antoine77340 commented 3 years ago

Actually I did not play with this model a long time ago, but from what I remember, this batch norm with bias had minor impact on the final results.

SilvioGiancola commented 3 years ago

Gotcha, thank you for your insights!