TaoRuijie / AVCleanse

ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'
30 stars 3 forks source link

Question for pretrain visual model #4

Closed huayu-2049 closed 4 months ago

huayu-2049 commented 5 months ago

Hello, Thanks for the code and your sharing. When I was doing evaluation for this code these days, the pretrain visual model including V-Vox2.model and V-Glint.model, in my experiment, do not have a correct parameter numbers of the visual model for both IResNet18 and IResNet 50, resulting in a terrible evaluation scores and EER.

Specifically, the parameter of V-Vox2.model is 422 while the model.state_dict() only have 188 parameters. Meanwhile, the parameter of V-Glint.model is missing face_loss.weights so numbers of parameter is one smaller. I wonder if I was doing wrong in the process of evaluation or the pretrained model given is not correct.

( by the way, the pretrained audio model given is correct and works well. )