About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160?

HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.

BSD 3-Clause "New" or "Revised" License

575 stars 165 forks source link

About Data_loader.py, "utterance = utterance[:, :, 160]", why we use the num 160? #37

Closed Bovey0809 closed 5 years ago

FengLeee commented 5 years ago

During inference time, for every utterance we apply a sliding window of fixed size (lb + ub)/2 = 160 frames with 50% overlap. We compute the d-vector for each window. The final utterance-wise d-vector is generated by L2 normalizing the window-wise d-vectors, then taking the element-wise averge (as shown in Figure 4). read the paper Generalized End-to-End Loss for Speaker Verification