Closed MuruganR96 closed 6 years ago
The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.
Accuracy was 95% when differentiating between 4 speakers at a time.
Thank you very much sir.
On Tue 20 Nov, 2018, 7:48 AM HarryVolek <notifications@github.com wrote:
The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.
Accuracy was 95% when differentiating between 4 speakers at a time.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HarryVolek/PyTorch_Speaker_Verification/issues/5#issuecomment-440112488, or mute the thread https://github.com/notifications/unsubscribe-auth/AiT-IFyzbJOYZbarp_sKeXeBWI2tjyJ6ks5uw2ZigaJpZM4YpltQ .
The embeddings produced by this code are currently "utterance" level embeddings, not "segment" level. To go from utterance to segment level d-vectors, the steps under section 2 of https://arxiv.org/abs/1710.10468 will have to be implemented.
Accuracy was 95% when differentiating between 4 speakers at a time.
Hi,
Could you be more specific about the segment
level embeddings? So do we need to implement this for training, or does it only happen during inference (i.e. while using pretrained model to create the embedding features for an arbitrary length audio input)?
Thanks, Xin
The difference between utterance and segment level embeddings do not matter for training the model, or for speaker verification.
sir once again explain segment vs utterance. i have only basic knowledge about these concepts. how can i change utterance level to segment level embedding? where i have to change? sir can you help me for this concept? give me some suggestions sir. thank you.
Accuracy was 95% when differentiating between 4 speakers at a time.
sir in utterance level is an TI-SV. so it will satisfy for wake-work technique. in different speakers wake-word like 'ok google'. will it differentiating speakers at a time?
thank you sir.
I added a script to create embeddings compatible with https://github.com/google/uis-rnn.
thank you so much sir.
sir i have one doubt.
is this d-vector embedding PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" is moreover common as UIS-RNN or not?
accuracy for d-vector embeddings?
is it satisfying continuous d-vector embeddings(as sequences) or not?
sir one you saw this url. i did TIMIT dataset to generated the d-vector embeddings. but how to feed or initialize this embeddings i don't know.
https://github.com/google/uis-rnn/issues/6#issuecomment-439789103
https://github.com/google/uis-rnn/issues/6#issuecomment-439967734
Some of the libraries are only able to produce per-utterance d-vector embeddings, while for UIS-RNN, we require continuous d-vector embeddings (as sequences). We have no guarantee which third-party library supports this
can you help me sir? thank you