Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
245 stars 81 forks source link

Will difference utterences of a same speaker get similar embeddings? #64

Closed Maxxiey closed 3 years ago

Maxxiey commented 4 years ago

Hi @Walleclipse , great repo here , thanks , and it is more like a question than an issue , hope you don't mind.

In select_batch.py , an embedding is generated by this line:

https://github.com/Walleclipse/Deep_Speaker-speaker_recognition_system/blob/6a00d5d106aa54b18f534aa9432ef26feb0268f3/select_batch.py#L124

I wonder what will happen if I feed in serveral different utternces of a same person? Should the embeddings be almost the same? And is the .h5 file you provided in ./checkpoints a good enough model to test this idea with?

Looking forward to your reply.

Max

Walleclipse commented 4 years ago

Hi, thanks for your interest. Your question is quite interesting.

  1. I am not sure whether the embeddings of different utterances of the same speaker are almost the same. But, I assume that the cosine similarity of the utterances is close to 1. (That means, In embedding space, embeddings of difference utterances of the same speaker are parallel to each other.)

  2. I think the .h5 file is a good stater model to check the idea. At least on the LibriSpeech dataset, the model has achieved good results. You can check the idea with this model. Please let me know if you have any interesting findings.

Maxxiey commented 4 years ago

Many thanks, I am now training new models following your tips. I am thinking that this part of code could be used to extract latent features of how a person speak, such as prosody or something else.

Your answers are quite inspiring and I'll udpate this thread if I find anything interesting.

Max

Walleclipse commented 4 years ago

Thanks for your interest. Expecting your new result.