HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 166 forks source link

Different input size between training d-vector modle and creating dvector for uis-rnn #10

Closed wuqiangch closed 5 years ago

wuqiangch commented 5 years ago

when training d-vector model ,the input size is batchsizex180x40,but in creating dvecor feature for uis-rnn ,the input size of the d-vector model is batchsizex24x40.Does it matter?

HarryVolek commented 5 years ago

It does not matter in terms of the code working, but it does matter for performance of the https://github.com/google/uis-rnn clustering algorithm.

The solution mentioned in https://arxiv.org/abs/1810.04719 is to train the model with a variable amount of input frames picked between 24 and 160 from a uniform distribution (vary batch-by-batch). I have yet to implement and test this.

With that being said I was able to obtain OK results using https://github.com/google/uis-rnn with the code as-is.

wuqiangch commented 5 years ago

@HarryVolek Thanks! Have you release the code to train the d-vector model with a variable amount of input frames picked between 24 and 160 from a uniform distribution (vary batch-by-batch)? I can't find the code in data_load.py.

HarryVolek commented 5 years ago

Not at this moment.