Traceback (most recent call last):
File "download_voxforge_dataset.py", line 184, in
prepare_sample(f.replace(".tgz", ""), VOXFORGE_URL_16kHz + f, target_dir)
File "download_voxforge_dataset.py", line 139, in prepare_sample
transcriptions = open(tgz_prompt_file).read().strip().split("\n")
File "/home/jingru/anaconda3/envs/text_ss/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 144: invalid start byte
This occurs for some of the files with characters that cannot be decoded with 'utf-8', for example:
"ralfherzog-20070819_de2/mfc/de2-02 DIESE SICHERHEITSLüCKEN SIND BISHER UNBEKANNT"
The "ü" is an invalid 'utf-8' code, may I know how to solve this?
Hi, when I try to download Germen by setting VOXFORGE_URL_16kHz = 'http://www.repository.voxforge1.org/downloads/de/Trunk/Audio/Main/16kHz_16bit/', I will have the following error:
Traceback (most recent call last): File "download_voxforge_dataset.py", line 184, in
prepare_sample(f.replace(".tgz", ""), VOXFORGE_URL_16kHz + f, target_dir)
File "download_voxforge_dataset.py", line 139, in prepare_sample
transcriptions = open(tgz_prompt_file).read().strip().split("\n")
File "/home/jingru/anaconda3/envs/text_ss/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 144: invalid start byte
This occurs for some of the files with characters that cannot be decoded with 'utf-8', for example: "ralfherzog-20070819_de2/mfc/de2-02 DIESE SICHERHEITSLüCKEN SIND BISHER UNBEKANNT" The "ü" is an invalid 'utf-8' code, may I know how to solve this?
Thank you!