Regarding SAP implemantaion

@joonson Would you provide, some reference (paper/github) using which SAP is implemented? I have hard time understanding what is being done.

One such question is why non-linearity is used for this projection? torch.tanh(self.sap_linear(x))

Also , does [self.attention] gets trained as part of training?

I made changes to convert it to multi-head vs single one as implemented here, even though code works and I can train model I am not seeing improvement. Having better understanding would help. Thanks.

clovaai / voxceleb_trainer

Regarding SAP implemantaion #48