HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 166 forks source link

Sliding window implementation for training #12

Closed Turan111 closed 5 years ago

Turan111 commented 5 years ago

I haven't found sliding window implementation for training as described in the Google paper (fig.3). Only first and last 180 frames were taken from each segment. Why don't you implement sliding window? Could you explain the reason?

HarryVolek commented 5 years ago

A sliding window is not used during training for the paper I implemented.

Turan111 commented 5 years ago

I am sorry, if we speak about different papers. But according to the paper "Generalized end-to-end loss for speaker verification" fig.3 batch construction for training is shown. The second batch was constructed by sliding.
screenshot from 2019-01-01 12-37-26

HarryVolek commented 5 years ago

The images does not refer to the inference sliding window, but rather picking a random number of frames between 140 and 180 for each batch during training. And yes, I did not implement this, I may do it in the future.