HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 166 forks source link

is this d-vector embedding input "GE2E Speaker Verification" is moreover same as UIS-RNN or not? #5

Closed MuruganR96 closed 5 years ago

MuruganR96 commented 5 years ago

sir i have one doubt.

is this d-vector embedding PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" is moreover common as UIS-RNN or not?

accuracy for d-vector embeddings?

is it satisfying continuous d-vector embeddings(as sequences) or not?

sir one you saw this url. i did TIMIT dataset to generated the d-vector embeddings. but how to feed or initialize this embeddings i don't know.

https://github.com/google/uis-rnn/issues/6#issuecomment-439789103

https://github.com/google/uis-rnn/issues/6#issuecomment-439967734

Some of the libraries are only able to produce per-utterance d-vector embeddings, while for UIS-RNN, we require continuous d-vector embeddings (as sequences). We have no guarantee which third-party library supports this

can you help me sir? thank you

HarryVolek commented 5 years ago

The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.

Accuracy was 95% when differentiating between 4 speakers at a time.

MuruganR96 commented 5 years ago

Thank you very much sir.

On Tue 20 Nov, 2018, 7:48 AM HarryVolek <notifications@github.com wrote:

The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.

Accuracy was 95% when differentiating between 4 speakers at a time.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HarryVolek/PyTorch_Speaker_Verification/issues/5#issuecomment-440112488, or mute the thread https://github.com/notifications/unsubscribe-auth/AiT-IFyzbJOYZbarp_sKeXeBWI2tjyJ6ks5uw2ZigaJpZM4YpltQ .

xinli94 commented 5 years ago

The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.

Accuracy was 95% when differentiating between 4 speakers at a time.

Hi,

Could you be more specific about the segment level embeddings? So do we need to implement this for training, or does it only happen during inference (i.e. while using pretrained model to create the embedding features for an arbitrary length audio input)?

Thanks, Xin

HarryVolek commented 5 years ago

The difference between utterance and segment level embeddings do not matter for training the model, or for speaker verification.

MuruganR96 commented 5 years ago

sir once again explain segment vs utterance. i have only basic knowledge about these concepts. how can i change utterance level to segment level embedding? where i have to change? sir can you help me for this concept? give me some suggestions sir. thank you.

MuruganR96 commented 5 years ago

Accuracy was 95% when differentiating between 4 speakers at a time.

sir in utterance level is an TI-SV. so it will satisfy for wake-work technique. in different speakers wake-word like 'ok google'. will it differentiating speakers at a time?

thank you sir.

HarryVolek commented 5 years ago

I added a script to create embeddings compatible with https://github.com/google/uis-rnn.

MuruganR96 commented 5 years ago

thank you so much sir.