how to regist the person voice and predict it by new wav file

clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition

MIT License

1.02k stars 272 forks source link

how to regist the person voice and predict it by new wav file #120

Closed daizzhisheng closed 2 years ago

daizzhisheng commented 3 years ago

with your code,i don't know how to regist the person's voice and predict it by new wav file.

dimuthuanuraj commented 2 years ago

@joonson @dvisockas Can you give an idea about this? How we can enroll a new speaker and test new single audio segments with your code?

lengjiayi commented 2 years ago

After training the SV network, it can generate speaker embeddings for new speakers as well. You can choose some utterances, save theirs embedding (enrollment), and then compare the cosine similarity of an unregistered voice and the saved embeddings. If the similarity is greater than the threshold, then they will be considered as the same person.

dimuthuanuraj commented 2 years ago

@lengjiayi , I am kind of struggling on the save the embedding, Could you please explain how to do it in the code? Really appreciate your support on this.

Jungjee commented 2 years ago

Hi @dimuthuanuraj . I recommend you to check out this: https://github.com/Jungjee/RawNet/blob/master/python/RawNet3/infererence.py. Especially args.inference_utterance part. Thank you.