PaddlePaddle / PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互,请耐心等待】
https://www.paddlepaddle.org.cn/hub
Apache License 2.0
12.74k stars 2.07k forks source link

文字转写报错 #1843

Open QL-AR opened 2 years ago

QL-AR commented 2 years ago

'gbk' codec can't decode byte 0x8c in position 2088: illegal multibyte sequence File "[C:\Users\Administrator.paddlehub\modules\fastspeech2_baker\module.py]()", line 73, in init self.frontend = Frontend(phone_vocab_path=phones_dict) File "[G:\PY]()工程\yy.py", line 6, in model = hub.Module( 怎么回事

QL-AR commented 2 years ago

QQ截图20220420202315

QL-AR commented 2 years ago

请问这个错误怎么决绝?

imzjy commented 2 years ago

@QL-AR 将 %USERPROFILE%\.paddlehub\modules\fastspeech2_baker\assets\fastspeech2_nosil_baker_ckpt_0.4\phone_id_map.txt 这个文件的编码格式从UTF-8改成GBK covert_to_gbk

imzjy commented 2 years ago

详细说说这个问题吧,这个问题不是Baidu的锅,但是造成了很多的困扰。

Python的open调用,如果你不指定encoding,那么会使用系统默认的encoding,你可以使用下面的脚本看你的系统默认encoding是什么

python -c "import locale; print(locale.getpreferredencoding(False))"

如果你的Windows的System locale设置的是中文的话,那么你会看到cp936,这是Windows下GBK(GB2132)编码对应的codepage,那么现在问题来了:

你使用GBK编码去打开一个UTF8编码的文件,所以就报错了,报错信息类似:

'gbk' codec can't decode byte 0x8c in position 2088: illegal multibyte sequence

修改上面的文件编码为GBK其实是治标不治本的操作,因为PaddlePaddle,包括很多第三方库在使用open打开文件的时候都是不指定编码的。

好在Windows可以通过修改system locale中的多语言程序的UTF8支持来修复这个问题。

你可以通过运行intl.cpl或者在系统设置中更改系统的默认行为,这样在使用open调用的时候就会默认使用UTF8来解码。参考下图

set_system_locale_to_support_utf8

imzjy commented 2 years ago

继续踩坑,继续给解决方案。

如果你按照上面的设置,那么应用程序如果不支持Unicode的话,会出现乱码。包括你解压缩一些文件名,路径名。

那么现在怎么办呢?看看源码吧

if sys.platform.startswith("win"):
    # On Win32, this will return the ANSI code page
    def getpreferredencoding(do_setlocale = True):
        """Return the charset that the user is likely using."""
        if sys.flags.utf8_mode:
            return 'UTF-8'
        import _bootlocale
        return _bootlocale.getpreferredencoding(False)

_bootlocal.py引用的式Python下C模块_localemodule.c,再一看,使用的式Win32 NLS API,这个只能按user wide设置,而不能通过环境变量修改,此路不通。

那在看看sys.flags.utf8_mode文档吧。如果你继续看下去,你会发现这取决Python是否打开了UTF-8模式。转机来了,Python的UTF8模式可以通过环境变量PYTHONUTF8控制

所以解决方案来了,在你的系统环境变量中加一行。

PYTHONUTF8=1

这样既不影响你的user locale的设置,同时python使用open调用的时候默认用UTF-8来打开文件。

QL-AR commented 2 years ago

QQ截图20220422182306

QL-AR commented 2 years ago

语音识别报这个错误?

imzjy commented 2 years ago

@QL-AR 你看下README,是否系统依赖或者paddlehub的module没有安装成功。

https://github.com/PaddlePaddle/PaddleHub/tree/release/v2.2/modules/audio/asr/deepspeech2_librispeech

QL-AR commented 2 years ago

QQ截图20220423150036

QL-AR commented 2 years ago

QQ截图20220423150105

QL-AR commented 2 years ago

安装不上才报错

QL-AR commented 2 years ago

(PaddleDetection) PS C:\Users\Administrator> hub install deepspeech2_librispeech Download https://paddlehub.bj.bcebos.com/paddlehub_dev/deepspeech2_librispeech_1.0.0.tar.gz [##################################################] 100.00% Decompress C:\Users\Administrator.paddlehub\tmp\tmpeqi3bekt\deepspeech2_librispeech_1.0.0.tar.gz <traceback object at 0x000001D68BEC4440>

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 228, in unarchive_with_progress total_size += file.getxarinfo(filename).size File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 153, in getxarinfo return XarInfo(self._archive_fp.getmember(name), self.arctype) File "H:\Anaconda3\envs\PaddleDetection\lib\tarfile.py", line 1799, in getmember raise KeyError("filename %r not found" % name) KeyError: "filename 'deepspeech2_librispeech/deepspeech/decoders/swig/build/temp.linux-x86_64-3.7/kenlm/util/double-conversion/' not found"

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\utils.py", line 185, in generate_tempdir yield _dir File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 258, in _install_from_url return self._install_from_archive(file) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 374, in _install_from_archive for path, ds, ts in xarfile.unarchive_with_progress(archive, _tdir): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 233, in unarchive_with_progress yield filename, extract_size, total_size File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 109, in exit raise exit_exception(exit_value) KeyError: KeyError("filename 'deepspeech2_librispeech/deepspeech/decoders/swig/build/temp.linux-x86_64-3.7/kenlm/util/double-conversion/' not found")

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 625, in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\Users\Administrator\.paddlehub\tmp\tmpeqi3bekt\deepspeech2_librispeech_1.0.0.tar.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 805, in onerror _os.unlink(path) PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\Users\Administrator\.paddlehub\tmp\tmpeqi3bekt\deepspeech2_librispeech_1.0.0.tar.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\Scripts\hub-script.py", line 33, in sys.exit(load_entry_point('paddlehub==2.2.0', 'console_scripts', 'hub')()) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\commands\utils.py", line 78, in execute status = 0 if com['_entry']().execute(sys.argv[idx:]) else 1 File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\commands\install.py", line 55, in execute manager.install(name=name, version=version, ignore_env_mismatch=args.ignore_env_mismatch) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 190, in install return self._install_from_name(name, version, ignore_env_mismatch) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 265, in _install_from_name return self._install_from_url(item['url']) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 258, in _install_from_url return self._install_from_archive(file) File "H:\Anaconda3\envs\PaddleDetection\lib\contextlib.py", line 137, in exit self.gen.throw(typ, value, traceback) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\utils.py", line 185, in generate_tempdir yield _dir File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 830, in exit self.cleanup() File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 834, in cleanup self._rmtree(self.name) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 816, in _rmtree _shutil.rmtree(name, onerror=onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 757, in rmtree return _rmtree_unsafe(path, onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 627, in _rmtree_unsafe onerror(os.unlink, fullname, sys.exc_info()) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 808, in onerror cls._rmtree(path) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 816, in _rmtree _shutil.rmtree(name, onerror=onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 757, in rmtree return _rmtree_unsafe(path, onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 608, in _rmtree_unsafe onerror(os.scandir, path, sys.exc_info()) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 605, in _rmtree_unsafe with os.scandir(path) as scandir_it: NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\Administrator\.paddlehub\tmp\tmpeqi3bekt\deepspeech2_librispeech_1.0.0.tar.gz' (PaddleDetection) PS C:\Users\Administrator> hub install deepspeech2_librispeech Download https://paddlehub.bj.bcebos.com/paddlehub_dev/deepspeech2_librispeech_1.0.0.tar.gz [##################################################] 100.00% Decompress C:\Users\Administrator.paddlehub\tmp\tmpg7nl8mwc\deepspeech2_librispeech_1.0.0.tar.gz <traceback object at 0x0000024C8BEBCF80>

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 228, in unarchive_with_progress total_size += file.getxarinfo(filename).size File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 153, in getxarinfo return XarInfo(self._archive_fp.getmember(name), self.arctype) File "H:\Anaconda3\envs\PaddleDetection\lib\tarfile.py", line 1799, in getmember raise KeyError("filename %r not found" % name) KeyError: "filename 'deepspeech2_librispeech/deepspeech/decoders/swig/build/temp.linux-x86_64-3.7/kenlm/util/double-conversion/' not found"

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\utils.py", line 185, in generate_tempdir yield _dir File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 258, in _install_from_url return self._install_from_archive(file) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 374, in _install_from_archive for path, ds, ts in xarfile.unarchive_with_progress(archive, _tdir): File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 233, in unarchive_with_progress yield filename, extract_size, total_size File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\xarfile.py", line 109, in exit raise exit_exception(exit_value) KeyError: KeyError("filename 'deepspeech2_librispeech/deepspeech/decoders/swig/build/temp.linux-x86_64-3.7/kenlm/util/double-conversion/' not found")

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 625, in _rmtree_unsafe os.unlink(fullname) PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\Users\Administrator\.paddlehub\tmp\tmpg7nl8mwc\deepspeech2_librispeech_1.0.0.tar.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 805, in onerror _os.unlink(path) PermissionError: [WinError 32] 另一个程序正在使用此文件,进程无法访问。: 'C:\Users\Administrator\.paddlehub\tmp\tmpg7nl8mwc\deepspeech2_librispeech_1.0.0.tar.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "H:\Anaconda3\envs\PaddleDetection\Scripts\hub-script.py", line 33, in sys.exit(load_entry_point('paddlehub==2.2.0', 'console_scripts', 'hub')()) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\commands\utils.py", line 78, in execute status = 0 if com['_entry']().execute(sys.argv[idx:]) else 1 File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\commands\install.py", line 55, in execute manager.install(name=name, version=version, ignore_env_mismatch=args.ignore_env_mismatch) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 190, in install return self._install_from_name(name, version, ignore_env_mismatch) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 265, in _install_from_name return self._install_from_url(item['url']) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\module\manager.py", line 258, in _install_from_url return self._install_from_archive(file) File "H:\Anaconda3\envs\PaddleDetection\lib\contextlib.py", line 137, in exit self.gen.throw(typ, value, traceback) File "H:\Anaconda3\envs\PaddleDetection\lib\site-packages\paddlehub\utils\utils.py", line 185, in generate_tempdir yield _dir File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 830, in exit self.cleanup() File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 834, in cleanup self._rmtree(self.name) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 816, in _rmtree _shutil.rmtree(name, onerror=onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 757, in rmtree return _rmtree_unsafe(path, onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 627, in _rmtree_unsafe onerror(os.unlink, fullname, sys.exc_info()) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 808, in onerror cls._rmtree(path) File "H:\Anaconda3\envs\PaddleDetection\lib\tempfile.py", line 816, in _rmtree _shutil.rmtree(name, onerror=onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 757, in rmtree return _rmtree_unsafe(path, onerror) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 608, in _rmtree_unsafe onerror(os.scandir, path, sys.exc_info()) File "H:\Anaconda3\envs\PaddleDetection\lib\shutil.py", line 605, in _rmtree_unsafe with os.scandir(path) as scandir_it: NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\Administrator\.paddlehub\tmp\tmpg7nl8mwc\deepspeech2_librispeech_1.0.0.tar.gz' (PaddleDetection) PS C:\Users\Administrator>

imzjy commented 2 years ago

@QL-AR 是依赖没有装,文档说的挺清楚,需要安装下面的依赖。由于你是Windows,不确定是否支持。

libsndfile_swig

QL-AR commented 2 years ago

请问win10有决绝方案吗

QL-AR commented 2 years ago

QQ截图20220424100532

QL-AR commented 2 years ago

请问画不出来框怎么回事?