chiachunfu / speech

TensorFlow on mobile with speech-to-text DL models.
163 stars 63 forks source link

Can't Find the Training File #1

Open manashmandal opened 6 years ago

manashmandal commented 6 years ago

How did you train your model? Could you please provide the training file as well?

On Data_Process.ipynb file when calculating m, v and s you used mfcc parameters like this,

audio = mfcc(read_audio_from_filename(file, 16000),samplerate=16000,winlen=0.025,winstep=0.01,numcep=39,
                 nfilt=40)

But when I am running other cells on my data,

    inputs = convert_wav_mfcc(wav_path, 16000)
    normalize_inputs = (inputs - m)/s

This throws an exception that shape doesn't match, so I changed the function convert_wav_mfcc to this

samplerate = 16000
winlen = 0.025
winstep = 0.01
numcep = 39
nfilt = 40

def convert_wav_mfcc(file, fs=16000):
    """Turn raw audio data into MFCC with sample rate=fs."""
    inputs = mfcc(read_audio_from_filename(file, fs), samplerate=fs, winlen=winlen, winstep=winstep, numcep=numcep, nfilt=nfilt)
    return inputs

Now everything works fine.

chiachunfu commented 6 years ago

@manashmndl if you are looking for the lstm model, you could find the training script, lstm_ctc.py in the repo. For the wavenet model, I'm using a pretrained model that I got from here.

manashmandal commented 6 years ago

@chiachunfu thanks for your answer, but which MFCC implementation did you follow? The librosa one or the other library?

chiachunfu commented 6 years ago

@manashmndl I used the mfcc function implemented in python_speech_features module.