Closed mmxuan18 closed 5 years ago
VLAD is a general technique for quantisation, I simply use the bit that I think the best fit for my application, you can use the original netvlad implementation.
I don't think this matters, as the model can learn to use the same weights if that's really required for the performance.
Please check source code in any toolbox for implementing softmax, this is essential to keep numerical stability.
Too many L2 norm will make your training very difficult, as the feature will be mapped to an hypersphere, similarly the gradients will be collapsed to a small space, indeed, it will add more regularisation, make the model more robust, but it makes training slow.
this is the netvlad author's presentation, as show with yellow circle the feature map x is the same for two branch. but in your code, has some different: 1: frome feature map, x --> x_fc, and x --> x_k_center, then this two pass to vladpooling which do softmax and normalization, as compare to netvlad, look like x --> fc unnecessary 2: before compute softmax, why need to sub max first, this is seem not very common? 3: in netvlad there first do intra-normalization then l2-normalization (as one paper refer this improve the acc) but here only one l2-normaliztion
what's benefit will get from these differents?
i train use netvlad and vladpooling on my dataset with same optim params and use grad-cam to see the featmaps activation different, same times vladpooling will activate the background noise, but netvlad not: orignal code: orignal code add self_attetion after featmap and before vladpooling: self_attetion after featmap and before netvlad: