Embedder-net() - Githubissues

HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.

BSD 3-Clause "New" or "Revised" License

575 stars 165 forks source link

Hi,I have some questions (this is my graduation project and this is extremly important for me )

-the output of embedder-net() function is a [N,256] I need to understand what is N exactly is it the number of sliding windows (240ms)? -Can we use this output (embedder-net() function output) for speaker diarization (can we apply clustering algorithms to this sequences for speaker diarization)? -Can I understand how did you build train-sequence and train-cluster-id (the input of uis rnn ) because my dataset is different from TIMIT-corpus (Timit-corpus is a speaker recognition dataset not a speaker diarization dataset )? this is a link to the corpus I am using : https://github.com/EMRAI/emrai-synthetic-diarization-corpus Thank you in advance for your help

HarryVolek / PyTorch_Speaker_Verification

Embedder-net() #40