HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 165 forks source link

Question about inference #18

Closed tonytonyissaissa closed 5 years ago

tonytonyissaissa commented 5 years ago

Hi @HarryVolek,

I trained the model correctly. But now I have some .wav as inputs so how can I use the trained model in order to do inference ? Also, could the inference be used for speaker identification (verify if the utterance belongs to one of a set of N speakers) or is it just valid for speaker verification (verify if the utterance is for the claimed speaker) ? Thanks in advance, Tony,

HarryVolek commented 5 years ago

Hi @tonytonyissaissa .

The inference can be used for speaker identification. To do so, take a waveform of the speaker you want to identify, run it through the model, and store the embedding ("enrollment"). Compare future waveforms to your collection of embeddings. Take a look at the test function in train_speech_embedder.py for an example of how to compute similarity between embeddings.

tonytonyissaissa commented 5 years ago

Thank you @HarryVolek What is the value of cossim threshold which should I take for inference ? (that is if cossim(embedding_wav1, embedding_wav2) > threshold => wav1 and wav2 belong to the same speaker). is it EER_thresh from testing phase ?

HarryVolek commented 5 years ago

Pick the threshold which performs best. EER_thresh should be an indicator of this threshold.

Aurora11111 commented 5 years ago

@tonytonyissaissa have you referenceed this project succssfully?