HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 166 forks source link

Question about the pipeline. #6

Closed staplesinLA closed 5 years ago

staplesinLA commented 5 years ago

Thanks for sharing your great work. As a newer to to speaker verification, I have two question here. First, what's the purpose of perm and unperm in training script. Second, since the original data matrix is disordered by the perm index, so how the loss is calculated between different speakers?

Thanks a lot!

staplesinLA commented 5 years ago

Oh, It seems that the pair operations of perm and unperm reconstruct the original utterance order. So, the remaining question is: why we should randomize the utterance samples? Because those utterances are still in a same batch.

HarryVolek commented 5 years ago

I believe if the structure of the batch input contains information about which speaker is associated with which utterance the algorithm will fit itself on the positioning of the batch input instead of the contents of the recording. So I permutate and unpermutate after retrieving the loss function output.