dspavankumar / keras-kaldi

Keras Interface for Kaldi ASR
GNU General Public License v3.0
121 stars 41 forks source link

self.inputFeatDim in steps_kt/dataGenerator.py #4

Open yh1008 opened 7 years ago

yh1008 commented 7 years ago

Hello Mr. Kumar,

I noticed that you set

self.inputFeatDim = 429 ## IMPORTANT: HARDCODED. Change if necessary.

I am wondering how can I check the inputFeatDim of my dataset?

Thank you very much!

dspavankumar commented 7 years ago

Use feat-to-dim from Kaldi. This will usually be 13 dimensions (MFCCs). We also add delta and acceleration coefficients (add-deltas) in the script, which makes it 13*3=39. Then we concatenate 11 frames into one (splice-feats), which makes it 39*11=429.

Miail commented 7 years ago

So it is expected that all audio files have the same number of frames, or is it possible to make it only extract a certain number of frames?

dspavankumar commented 7 years ago

No, the dimension of the frame is independent of the number of frames in an utterance. Typically no two audio files will have the same number of frames. However the dimension of each frame must be the same, to train models.

Miail commented 7 years ago

hm... would that not only occur if the sample rate is changed/different for two audio files?

dspavankumar commented 7 years ago

Typically sampling rate is same across the dataset. Even if the sampling rate is different, we could extract the same number of cepstra (or any other features) that form frames, from the audio files. So the frame size is always the same.

yh1008 commented 7 years ago

Thanks for the explanations!

Just to verfiy, in my case, after I trained delta+deltadelta using steps/train_deltas.sh, I also applied LDA+MLLT transformation using steps/train_lda_mllt.sh --splice-opts "--left-context=3 --right-context=3", the default dimension output of LDA is set to 40. With the above set up, the system produces 40*11 = 440 as inputFeatDim then?

dspavankumar commented 7 years ago

Yes.

swang423 commented 6 years ago

It would be better if it throws a warning or exception if the user is not aware of the hard coded in_feat_dim and the specified feature has dimension mismatch. I ran TIMIT using mismatched feautre (41-dimension fbank) and the default script ran completely fine and yields satisfactory results (23% PER)

dspavankumar commented 6 years ago

I've always gotten error when the dimension mismatched, precisely when the empty array of size self.inputFeatDim is appended with the received data: self.x = numpy.concatenate ((self.x[self.batchPointer:], x)) I don't think a dimension mismatch will allow it to progress at this point. Can you recheck your experiment?

laleye commented 5 years ago

Hi M Kumar,

I got the same error despite that I put the correect value 440 of self.inputFeatDim.

self.x = numpy.concatenate ((self.x[self.batchPointer:], x))
ValueError: all the input array dimensions except for the concatenation axis must match exactly

Have you any idea about How I can fix the error?