Closed jodiesue closed 8 months ago
We concat two speech clips as input with a small piece of silence inserted in between. We use the prompt: "Do you only hear the same person talking? Answer yes or no." The model needs to determine whether the two speech clips are spoken by the same speaker.
However, I have to admit that the current model doesn't generalise well enough for speaker-related tasks, even for the simplest speaker verification task 😞. It seems that it can only recognise speakers in Voxceleb1 well, but not any other speakers.
Nice work! But I'm wondering that how can SALMONN adopt the speaker verification task? What is the prmpt, input and output?