JunyiPeng00 / SLT22_MultiHead-Factorized-Attentive-Pooling

An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
11 stars 2 forks source link
fine-tuning pre-trained-model speaker-verification voxceleb2

Wespeaker implementation

I re-wrote the whole system with wespeaker toolkit and achieved higher results, which can be found here Model AS-Norm LMFT QMF vox1-O-clean vox1-E-clean vox1-H-clean
WavLM Base Plus + MHFA × × 0.750 0.716 1.442
WavLM Large + MHFA × × 0.649 0.610 1.235

SLT22_MultiHead-Factorized-Attentive-Pooling

This repository contains the Pytorch code of our paper titled as An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification. This implementation is based on vox_trainer.


Dependencies & Data preparation

Training and Testing

name=Baseline.yaml
name_lm=Baseline_lm.yaml

python3 trainSpeakerNet.py --config yaml/$name --distributed >> log/$name.log
python3 trainSpeakerNet.py --config yaml/$name_lm --distributed >> log/$name_lm.log
python3 trainSpeakerNet_Eval.py --config yaml/$name_lm  --eval >> log/$name_lm.log

where lm denotes large margin fine-tuning.

Citation

If you used this code, please kindly consider citing the following paper:

@INPROCEEDINGS{10022775,
  author={Peng, Junyi and Plchot, Oldřich and Stafylakis, Themos and Mošner, Ladislav and Burget, Lukáš and Černocký, Jan},
  booktitle={2022 IEEE Spoken Language Technology Workshop (SLT)}, 
  title={An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification}, 
  year={2023},
  volume={},
  number={},
  pages={555-562},
  doi={10.1109/SLT54892.2023.10022775}}