HawkAaron / E2E-ASR

PyTorch Implementations for End-to-End Automatic Speech Recognition
126 stars 27 forks source link

the question that I use the timit datasets ,but the rnnt model does not coverage! #8

Closed gccyxy closed 5 years ago

gccyxy commented 5 years ago

I adopt this model with timit datasets, but the rnnt model does not coverage, the loss is about 500+, and I want to know the reason which cause it?

gccyxy commented 5 years ago

I think the problem is my data may be wrong. Can you provide the data you used in your model? my email address is 18710899635@163.com Thanks a lot!

HawkAaron commented 5 years ago

@gccyxy I used run.sh to extract 40-dim fbank feature, then feature_transform.sh to get 123-dim feature. Do you have any problem with these scripts?

gccyxy commented 5 years ago

@gccyxy I used run.sh to extract 40-dim fbank feature, then feature_transform.sh to get 123-dim feature. Do you have any problem with these scripts?

I Have met the problem that when i Run run.sh, I get the mistake that :

[root@node02 s5]# ./run.sh

            Data & Lexicon & Language Preparation                     

============================================================================ local/timit_data_prep.sh: line 108: wav-to-duration: command not found how can solve it ? IS there any problem about my setting path??

HawkAaron commented 5 years ago

@gccyxy Its cool. kaldi/src/featbin/wav-to-duration.cc that is the command source path. Please compile Kaldi from source if possible.

gccyxy commented 5 years ago

And I have a Question : If I does not run feature_transform.sh. I just run the 39-dim data for exec your code, Is it working?

HawkAaron commented 5 years ago

You may need to change the network input dim.

gccyxy commented 5 years ago

You may need to change the network input dim.

You may need to change the network input dim.

I have met the problem at the issue <<feature_transform makes 69 dimension>>, and i have listed the problems. can you teach me how to handle it?

HawkAaron commented 5 years ago

Please use this fbank.conf to extract acoustic feature if you use kaldi

gccyxy commented 5 years ago

Please use this fbank.conf to extract acoustic feature if you use kaldi.

If not, please follow other recipes to do feature extraction, and then change the input dimension if necessary.

Very Thanks to you !!! It is dispirited that when I use the 69-dim data for input , however, I met the problem as followes:

File "/root/anaconda3/lib/python3.7/site-packages/warprnnt_pytorch-0.1-py3.7-linux-x86_64.egg/warprnnt_pytorch/init.py", line 40, in forward grads /= minibatch_size RuntimeError: CUDA error: an illegal memory access was encountered

Do you have some advise for solving it ?

HawkAaron commented 5 years ago

Please make sure PyTorch >= 1.0, and use gcc >= 4.9 to install the python binding.

gccyxy commented 5 years ago

Please make sure PyTorch >= 1.0, and use gcc >= 4.9 to install the python binding. My gcc version is 4.8.5 and my PyTorch ==1.1. Is It does matter ? What can I do instead of re-installing the gcc version?

gccyxy commented 5 years ago

Please make sure PyTorch >= 1.0, and use gcc >= 4.9 to install the python binding.

It have something related to gcc Version?

gccyxy commented 5 years ago

Please make sure PyTorch >= 1.0, and use gcc >= 4.9 to install the python binding.

If i just use the cpu for training it with about 100 epoches, how long will spend for training ?

HawkAaron commented 5 years ago

Please ask whatever you want in a single comment if possible.

And for some reason, only GCC higher than 4.9 (include) is supported for PyTorch CPP extension.

gccyxy commented 5 years ago

Please ask whatever you want in a single comment if possible.

And for some reason, only GCC higher than 4.9 (include) is supported for PyTorch CPP extension. All right ! The code can be run in the CPU in my Tesla V100 machine while GPU failed. If I just try it on CPU , how long will cost for 100 epoches ?

HawkAaron commented 5 years ago

Only one hour for TIMIT dataset. You may run the model on GPU, but calculate RNN-T loss on CPU.

gccyxy commented 5 years ago

Only one hour for TIMIT dataset. You may run the model on GPU, but calculate RNN-T loss on CPU.

Thanks a lot for your time ! The information of my training is as followes: """"" [Epoch 1 Batch 50] loss 4615.07 [Epoch 1 Batch 100] loss 12606.34 WARNING: Forward backward likelihood mismatch 0.109375 WARNING: Forward backward likelihood mismatch 0.613281 [Epoch 1 Batch 150] loss 9103.89 [Epoch 1 Batch 200] loss 1834.24 [Epoch 1 Batch 250] loss 1286.81 [Epoch 1 Batch 300] loss 1492.13 WARNING: Forward backward likelihood mismatch 0.167969 [Epoch 1 Batch 350] loss 12697.51 WARNING: Forward backward likelihood mismatch 0.253906 WARNING: Forward backward likelihood mismatch 0.105469 [Epoch 1 Batch 400] loss 16489.23 [Epoch 1 Batch 450] loss 16629.37 """""" From the loss, It seems the model does not coverage. Is there any problem about for training ?

HawkAaron commented 5 years ago

Is your __init__.py the same as https://github.com/HawkAaron/warp-transducer/blob/master/pytorch_binding/warprnnt_pytorch/__init__.py?

gccyxy commented 5 years ago

Is your __init__.py the same as https://github.com/HawkAaron/warp-transducer/blob/master/pytorch_binding/warprnnt_pytorch/__init__.py? I have done it ! The problem may be that I have changed the code in some aspects by my way, when i run your whole code, it make sense! However, It is so kind of you ! And many many thanks to you so kind and handsome guy!!!

gccyxy commented 5 years ago

Is your __init__.py the same as https://github.com/HawkAaron/warp-transducer/blob/master/pytorch_binding/warprnnt_pytorch/__init__.py?