HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 165 forks source link

Embedder-net() #40

Closed nidhal1231 closed 5 years ago

nidhal1231 commented 5 years ago

Hi,I have some questions (this is my graduation project and this is extremly important for me )

-the output of embedder-net() function is a [N,256] I need to understand what is N exactly is it the number of sliding windows (240ms)? -Can we use this output (embedder-net() function output) for speaker diarization (can we apply clustering algorithms to this sequences for speaker diarization)? -Can I understand how did you build train-sequence and train-cluster-id (the input of uis rnn ) because my dataset is different from TIMIT-corpus (Timit-corpus is a speaker recognition dataset not a speaker diarization dataset )? this is a link to the corpus I am using : https://github.com/EMRAI/emrai-synthetic-diarization-corpus Thank you in advance for your help

HarryVolek commented 5 years ago
  1. N is the number of embeddings.
  2. You can. That is what https://github.com/google/uis-rnn does
  3. (Answering #41 ) The align_embeddings averages the "window level embeddings" into the "segment level d-vectors"

Since this is for your graduation project, I recommend reading https://arxiv.org/pdf/1810.04719.pdf, particularly section 2. The dvector_create script just follows what is described in that section.