Closed Poeroz closed 8 months ago
The recommended setting is using model_type=valle
and wavlm_large_fintune.pth
.
This is the setting that we ended up using in the Seamless paper.
The ECAPA architecture is supported only for the sake of reproducibility of some our preliminary experiments that were not published (or for the unlikely case if you train your own ECAPA speech encoder).
Thanks for your reply!
Hi, thanks for your great work! I would like to use VSim for speaker similarity evaluation. From the document, I see that I should use "wavlm_large_fintune.pth" model when "model_type=valle". I'm not sure whether model path should be used when I want to use "model_type=ecapa"? Thanks!