clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.01k stars 272 forks source link

Question about angleproto.py code #146

Closed huangrunqian closed 2 years ago

huangrunqian commented 2 years ago

Why do you use fake labels instead of the input real labels here? ---> from angleproto.py

def forward(self, x, label=None):

    assert x.size()[1] >= 2

    out_anchor      = torch.mean(x[:,1:,:],1)
    out_positive    = x[:,0,:]
    stepsize        = out_anchor.size()[0]

    cos_sim_matrix  = F.cosine_similarity(out_positive.unsqueeze(-1),out_anchor.unsqueeze(-1).transpose(0,2))
    torch.clamp(self.w, 1e-6)
    cos_sim_matrix = cos_sim_matrix * self.w + self.b

    label   = torch.from_numpy(numpy.asarray(range(0,stepsize))).cuda()
    nloss   = self.criterion(cos_sim_matrix, label)
    prec1   = accuracy(cos_sim_matrix.detach(), label.detach(), topk=(1,))[0]

    return nloss, prec1
huangrunqian commented 2 years ago

Made it out finally~

speaker-lover commented 2 years ago

Excuse me, what is the result of using this loss function? I got eer=2.6 minDCF=0.2. Is this normal? Does it belong to self-supervised learning method or supervised learning method?

huangrunqian commented 2 years ago

I just use the loss function, while the model and data are different, so the result is not comparable. Maybe you could check and compare your result with that of the original paper. As you use the speaker labels, it's a supervised learning method.