HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
576 stars 164 forks source link

How to Use Your Audio Data Set #32

Open zyc1310517843 opened 5 years ago

zyc1310517843 commented 5 years ago

Hello, this code is for TIMIT data set. If you change it into your own audio data, it will not work. If you use your own audio data as a data set, how can I do it? Thank you very much for your guidance. #

Kailegh commented 5 years ago

Did you finde out? I am having the same issues, I have a dataset where the number of utterances per speaker is different so I think I need to change how the data is loaded

chrisspen commented 4 years ago

Is this needed? If you want to generate embeddings for your own audio data to feed into something like uis-rnn, do you need to also train your own speaker embedding model or can you just use the model trained from TIMIT data?

I thought the point of training an embedding model was to reduce the dimensionality of the audio. Raw sound has an huge amount of information in it, but the voice information we're interested in is only a relatively small part of that. Training an embedding model is a way of simplifying the audio into a form that's a lot smaller, yet still captures the unique signature of the voice data. To that end, we shouldn't have to train our own embedding model from our own data right, assuming our audio is in English?

Where you need to use your own data when training is when using uis-rnn.