Question for pretrain visual model

Hello, Thanks for the code and your sharing. When I was doing evaluation for this code these days, the pretrain visual model including V-Vox2.model and V-Glint.model, in my experiment, do not have a correct parameter numbers of the visual model for both IResNet18 and IResNet 50, resulting in a terrible evaluation scores and EER.

Specifically, the parameter of V-Vox2.model is 422 while the model.state_dict() only have 188 parameters. Meanwhile, the parameter of V-Glint.model is missing face_loss.weights so numbers of parameter is one smaller. I wonder if I was doing wrong in the process of evaluation or the pretrained model given is not correct.

( by the way, the pretrained audio model given is correct and works well. )

TaoRuijie / AVCleanse

Question for pretrain visual model #4