two question about the vladpooling implement compare to the paper of netvlad?

WeidiXie / VGG-Speaker-Recognition

Utterance-level Aggregation For Speaker Recognition In The Wild

363 stars 97 forks source link

this is the netvlad author's presentation, as show with yellow circle the feature map x is the same for two branch. but in your code, has some different: 1: frome feature map, x --> x_fc, and x --> x_k_center, then this two pass to vladpooling which do softmax and normalization, as compare to netvlad, look like x --> fc unnecessary 2: before compute softmax, why need to sub max first, this is seem not very common? 3: in netvlad there first do intra-normalization then l2-normalization (as one paper refer this improve the acc) but here only one l2-normaliztion

what's benefit will get from these differents?

i train use netvlad and vladpooling on my dataset with same optim params and use grad-cam to see the featmaps activation different, same times vladpooling will activate the background noise, but netvlad not: orignal code: orignal code add self_attetion after featmap and before vladpooling: self_attetion after featmap and before netvlad:

WeidiXie / VGG-Speaker-Recognition

two question about the vladpooling implement compare to the paper of netvlad? #36