This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
I'm using the custom speech dataset and the modified asr yaml file.
I modified the audio feat_type as 'mfcc' and feat_dim as '39'.
After using above changes, I got the error below.
Could you help me out for this problem?
Traceback (most recent call last):
File "main.py", line 79, in
solver.exec()
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/bin/train_asr.py", line 106, in exec
teacher=txt, get_dec_state=self.emb_reg)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, kwargs)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/asr.py", line 92, in forward
encode_feature, encode_len = self.encoder(audio_feature, feature_len)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/ubuntu/ChangNam/yonseiAI/End-to-end-ASR-Pytorch_mfcc/src/asr.py", line 365, in forward
input_x, enc_len = layer(input_x, enc_len)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/module.py", line 59, in forward
feature, feat_len = self.view_input(feature, feat_len)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/module.py", line 52, in view_input
feature = feature.view(bs, ts, self.in_channel, self.freq_dim)
RuntimeError: shape '[32, 540, 9, 13]' is invalid for input of size 673920
Hi,
I'm using the custom speech dataset and the modified asr yaml file. I modified the audio feat_type as 'mfcc' and feat_dim as '39'. After using above changes, I got the error below. Could you help me out for this problem?
Traceback (most recent call last): File "main.py", line 79, in
solver.exec()
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/bin/train_asr.py", line 106, in exec
teacher=txt, get_dec_state=self.emb_reg)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, kwargs)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/asr.py", line 92, in forward
encode_feature, encode_len = self.encoder(audio_feature, feature_len)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/ubuntu/ChangNam/yonseiAI/End-to-end-ASR-Pytorch_mfcc/src/asr.py", line 365, in forward
input_x, enc_len = layer(input_x, enc_len)
File "/home/ubuntu/anaconda3/envs/hear/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/module.py", line 59, in forward
feature, feat_len = self.view_input(feature, feat_len)
File "/home/ubuntu/End-to-end-ASR-Pytorch_mfcc/src/module.py", line 52, in view_input
feature = feature.view(bs, ts, self.in_channel, self.freq_dim)
RuntimeError: shape '[32, 540, 9, 13]' is invalid for input of size 673920
Thanks a lot.