danpovey / pocolm

Small language toolkit for creation, interpolation and pruning of ARPA language models
Other
90 stars 48 forks source link

TypeError: a bytes-like object is required, not 'str' in validate_text_dir.py with a solution #102

Closed farisalasmary closed 3 years ago

farisalasmary commented 3 years ago

I ran the tedlium recipe in Kaldi on my customized dataset and I encountered this error:

Traceback (most recent call last):
  File "/opt/kaldi/egs/tedlium/train_s5_r3/../../../tools/pocolm/scripts/validate_text_dir.py", line 102, in <module>
    SpotCheckTextFile(full_path)
  File "/opt/kaldi/egs/tedlium/train_s5_r3/../../../tools/pocolm/scripts/validate_text_dir.py", line 48, in SpotCheckTextFile
    line = f.readline().strip("\n")
TypeError: a bytes-like object is required, not 'str'

I used Python 3.8.5 with Anaconda setup (conda 4.9.2).

A simple solution to this problem could be by modifying the following line in "validate_text_dir.py": https://github.com/danpovey/pocolm/blob/6ba2b37ffb1ee374331683eac481e065ed76407e/scripts/validate_text_dir.py#L47

and replace it with the following snippet:

line = f.readline()
if type(line) == bytes:
    line = line.decode('utf-8')
line = line.strip('\n')
danpovey commented 3 years ago

Thanks! Resolving via #103