NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 372 forks source link

Keyerror while training asr model #508

Open raikarsagar opened 4 years ago

raikarsagar commented 4 years ago

`Traceback (most recent call last): File "/home/users/sagar/nvidia_deepspeech2/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/users/sagar/nvidia_deepspeech2/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/users/sagar/nvidia_deepspeech2/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: KeyError: 'f' Traceback (most recent call last):

File "/home/users/sagar/nvidia_deepspeech2/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 207, in call ret = func(*args)

File "/home/users/sagar/OpenSeq2Seq/open_seq2seq/data/speech2text/speech2text.py", line 419, in _parse_audio_transcript_element target_indices = [self.params['char2idx'][c] for c in transcript]

File "/home/users/sagar/OpenSeq2Seq/open_seq2seq/data/speech2text/speech2text.py", line 419, in target_indices = [self.params['char2idx'][c] for c in transcript]

KeyError: 'f'

 [[{{node PyFuncStateless}}]]
 [[{{node IteratorGetNext}}]]
 [[{{node Loss_Optimization/gradients/ForwardPass/Loss/ctc_loss/CTCLoss_grad/mul}}]]

`

hao-olli-ai commented 4 years ago

Maybe your 'f' character is not defined in alphabet. You should check whether all characters in your training text exist in your alphabet or not. In some cases, someone forget adding 'f' in alphabet, or there is one similar 'f' in alphabet but they are just similar forms and different in unicode. ('f' in alphabet and 'f' in training data are not the same) The best way is you create a function to check instead of checking by looking. Then you ignore the sentences those contain out of alphabet characters. It will be run.