Misalignment of "model.py" code with the original paper: layer 2 should only take the output of layer 1 as input, and layer 3 should only take the output of layer 2 as input

TaoRuijie / ECAPA-TDNN

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

MIT License

581 stars 111 forks source link

Dear Author,

I noticed that lines 179-183 in the "model.py" file do not adhere to the specifications outlined in the original paper (Fig. 2, available at https://arxiv.org/pdf/2005.07143.pdf). According to the paper, layer 2 should only take the output of layer 1 as input, and layer 3 should only take the output of layer 2 as input.

The current implementation in lines 179-183 of "model.py" is as follows: x1 = self.layer1(x) x2 = self.layer2(x + x1) x3 = self.layer3(x + x1 + x2) x = self.layer4(torch.cat((x1, x2, x3), dim=1))

However, the paper suggests the following structure: x1 = self.layer1(x) x2 = self.layer2(x1) x3 = self.layer3(x2) x = self.layer4(torch.cat((x1, x2, x3), dim=1))

Thank you for your attention to this matter.

TaoRuijie / ECAPA-TDNN

Misalignment of "model.py" code with the original paper: layer 2 should only take the output of layer 1 as input, and layer 3 should only take the output of layer 2 as input #59