[VSim ECAPA] What $MODEL_PATH should be used when using the ECAPA model for speaker similarity evaluation?

facebookresearch / stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

https://facebookresearch.github.io/stopes/

MIT License

246 stars 37 forks source link

[VSim ECAPA] What $MODEL_PATH should be used when using the ECAPA model for speaker similarity evaluation? #64

Closed Poeroz closed 8 months ago

Poeroz commented 8 months ago

Hi, thanks for your great work! I would like to use VSim for speaker similarity evaluation. From the document, I see that I should use "wavlm_large_fintune.pth" model when "model_type=valle". I'm not sure whether model path should be used when I want to use "model_type=ecapa"? Thanks!

avidale commented 8 months ago

The recommended setting is using model_type=valle and wavlm_large_fintune.pth. This is the setting that we ended up using in the Seamless paper. The ECAPA architecture is supported only for the sake of reproducibility of some our preliminary experiments that were not published (or for the unlikely case if you train your own ECAPA speech encoder).

Poeroz commented 8 months ago

Thanks for your reply!