astorfi / 3D-convolutional-speaker-recognition

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
Apache License 2.0
778 stars 275 forks source link

explain how to take a single wav file and extract features #13

Closed alanbekker closed 6 years ago

alanbekker commented 6 years ago

I've read your paper and it's really impresive.

Would like to ask you regarding the input preprocessing:

Assume I've got a wav file consisting 0.8 sec fs, signal = wav.read(file_name) Then I use mfec=speechpy.feature.mfe(signal,fs) the size if mfec is [79,40] so I changed the input file to be 0.81sec and then I received [80,40]...

according to your paper I need [20,80,40] to create one training example so I can create this by duplication my original [80,40] by 20 (this is how you did at testing phase) or by concatenating 20 different utterances of 0.81sec. Is that correct?

Any clarifications would be appreciated!

Alan

astorfi commented 6 years ago

@alanbekker Thank you so much for your kind words. I believe your understanding is quite accurate. I just mentioned 0.8 seconds in the paper but as you said it's almost 0.81 seconds! That's how I do it exactly: Concatenating different utterances for training.