The following error occurs when working on Telugu, Tamil and presumably other languages due to encoding issues :
Traceback (most recent call last):
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prepare_int_data.py", line 168, in <module>
num_words = GetNumWords(args.vocab)
File "/home/sourya4/kaldi/egs/tamil_telugu_proj/s5_r3/../../../tools/pocolm/scripts/prepare_int_data.py", line 75, in GetNumWords
universal_newlines=True)
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 425, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.6/subprocess.py", line 850, in communicate
stdout = self.stdout.read()
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
Fixed by adding encoding='utf-8' to the subprocess.check_output call.
Pull Request #109 submitted with the fix. Not sure if it is comprehensive.
The following error occurs when working on Telugu, Tamil and presumably other languages due to encoding issues :
Fixed by adding
encoding='utf-8'
to thesubprocess.check_output
call.Pull Request #109 submitted with the fix. Not sure if it is comprehensive.