clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Regarding SAP implemantaion #48

Closed 009deep closed 4 years ago

009deep commented 4 years ago

@joonson Would you provide, some reference (paper/github) using which SAP is implemented? I have hard time understanding what is being done.

One such question is why non-linearity is used for this projection? torch.tanh(self.sap_linear(x))

Also , does [self.attention] gets trained as part of training?

I made changes to convert it to multi-head vs single one as implemented here, even though code works and I can train model I am not seeing improvement. Having better understanding would help. Thanks.

joonson commented 4 years ago

We follow the implementation described in this paper. Equation 1 in this paper uses tanh non-linearity.

self.attention is trained together with the network.