bytedance / SALMONN

SALMONN: Speech Audio Language Music Open Neural Network
https://bytedance.github.io/SALMONN/
Apache License 2.0
978 stars 75 forks source link

How to adopt a speaker verification task? #30

Closed jodiesue closed 8 months ago

jodiesue commented 9 months ago

Nice work! But I'm wondering that how can SALMONN adopt the speaker verification task? What is the prmpt, input and output?

TCL606 commented 9 months ago

We concat two speech clips as input with a small piece of silence inserted in between. We use the prompt: "Do you only hear the same person talking? Answer yes or no." The model needs to determine whether the two speech clips are spoken by the same speaker.

However, I have to admit that the current model doesn't generalise well enough for speaker-related tasks, even for the simplest speaker verification task 😞. It seems that it can only recognise speakers in Voxceleb1 well, but not any other speakers.