HawkAaron / RNN-Transducer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
136 stars 31 forks source link

Dataset and run.sh #8

Closed yannhan closed 5 years ago

yannhan commented 5 years ago

Excuse me, could you show how to use the Kaldi-timit scripts in your source code?

HawkAaron commented 5 years ago

Change the timit path to your own location, then run run.sh, you will get the 41-dim log-mel filter-bank feature with anergy.

feature_transform.sh here is to calculate the global cmvn stats using 3 window context on 41-dim feature with delta and delta-delta.

yannhan commented 5 years ago

So I need to install Kaldi first?

HawkAaron commented 5 years ago

Yeah, kaldi is necessary to reproduce the results exactly. But it only used for feature extraction, maybe librosa is easier to use.

yannhan commented 5 years ago

Thanks, HawkAaron, I have extracted the features, but during trainning, I came across this bug : Screenshot from 2019-04-15 01-07-49 I used the other datasets, maybe this is the problem. But I can't use TIMIT since this dataset is not free. I don't know why the dimension is not correct. The batch size is 1, the input feature dimension of one sample is 199x90, the label dimension is 21x1. Both type are np.array.

HawkAaron commented 5 years ago

The batch dimension for input and label should be the same. For your example, the input should be shaped [1, 199, 90], the label should be shaped [1, 21]. Both are mxnet ndarray.

yannhan commented 5 years ago

I modified the type, but the bug still exists.

WeChat18f137c7048732a069c15023ae697b86
HawkAaron commented 5 years ago

Could you show me the inputs' shapes before the loss function call ?

yannhan commented 5 years ago

Okay, the first figure is the corresponding code, the second is the output. Is that you wa

WeChata433b9baa28a188b4bacf7ef38834210 WeChat820722317e37a02c091df5b5e012edca

nt?

yannhan commented 5 years ago

BTW, I also found another thing. In the code, f and g should have the same size except the last dim, right? But I found f and g are same size including the last dim....

WeChat674f191c9e20076543460034a2eaa584 WeChat560c57e01fe10e7d29718a864c159523
HawkAaron commented 5 years ago

I have updated the code, you can try it now.

yannhan commented 5 years ago

Thanks very much!

AtefehSamadi commented 1 year ago

Hi, I'm very new in this field. Could you please help me how can I download the TIMIT dataset? I tried to download the dataset from (https://catalog.ldc.upenn.edu/LDC93S1), but unfortunately I can't find the download link. Could you please help me Thanks in advance Atefeh