TaoRuijie / ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
MIT License
581 stars 111 forks source link

Vox1_E and Vox1_H #38

Closed JJun-Guo closed 1 year ago

JJun-Guo commented 1 year ago

您好?请问在没有norm的情况下 Vox1_E and Vox1_H 的测试指标如何呢?

TaoRuijie commented 1 year ago

抱歉这个没有测试过...一般来说是10%左右的差距

JJun-Guo commented 1 year ago

抱歉这个没有测试过...一般来说是10%左右的差距

请问大佬可以开源AS-norm的相关代码吗?

TaoRuijie commented 1 year ago

请check这里https://github.com/TaoRuijie/ECAPA-TDNN/issues/26#issuecomment-1060401825

TaoRuijie commented 1 year ago

def as_norm(score, embedding_1, embedding_2, cohort_feats, topk): score_1 = torch.matmul(cohort_feats, embedding_1.T)[:,0] score_1 = torch.topk(score_1, topk, dim = 0)[0] mean_1 = torch.mean(score_1, dim = 0) std_1 = torch.std(score_1, dim = 0) score_2 = torch.matmul(cohort_feats, embedding_2.T)[:,0] score_2 = torch.topk(score_2, topk, dim = 0)[0] mean_2 = torch.mean(score_2, dim = 0) std_2 = torch.std(score_2, dim = 0)

score = 0.5 (score - mean_1) / std_1 + 0.5 (score - mean_2) / std_2

cohort_feats is the extracted embedding of the training set, with the shape (N, 192), N is the number of training data

JJun-Guo commented 1 year ago

我看源代码中每一个样本是有两个embedding的,维度分别为(1,192)和(5,192),as_norm中只有一个,请问代表的是哪一个呢?是否考虑在as_norm中每个样本也加入两个embedding做规整然后求均值呢?

JJun-Guo commented 1 year ago

请问这里的N是用的训练全集吗?另外topk是多少呢?N我用10000,topk用300或者3000,EER都达到了40%多.......