dmMaze / BallonsTranslator

深度学习辅助漫画翻译工具, 支持一键机翻和简单的图像/文本编辑 | Yet another computer-aided comic/manga translation tool powered by deeplearning
GNU General Public License v3.0
2.41k stars 162 forks source link

UnicodeEncodeError: 'cp950' codec can't encode character '\u7eeb' in position 24: illegal multibyte sequence #444

Closed SkywalkerJi closed 4 months ago

SkywalkerJi commented 4 months ago
--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\skywa\miniconda3\Lib\logging\__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
UnicodeEncodeError: 'cp950' codec can't encode character '\u7eeb' in position 24: illegal multibyte sequence
Call stack:
  File "E:\Games\BallonsTranslator\BallonsTranslator\ui\module_manager.py", line 82, in run
    self.job()
  File "E:\Games\BallonsTranslator\BallonsTranslator\ui\module_manager.py", line 232, in _run_translate_pipeline
    self._translate_page(self.imgtrans_proj.pages, page_key, emit_finished=False)
  File "E:\Games\BallonsTranslator\BallonsTranslator\ui\module_manager.py", line 203, in _translate_page
    self.translator.translate_textblk_lst(page)
  File "E:\Games\BallonsTranslator\BallonsTranslator\modules\translators\base.py", line 205, in translate_textblk_lst
    _translations = self.translate(text_list)
  File "E:\Games\BallonsTranslator\BallonsTranslator\modules\translators\base.py", line 149, in translate
    text_trans = self._translate(text_source)
  File "E:\Games\BallonsTranslator\BallonsTranslator\modules\translators\trans_sakura.py", line 512, in _translate
    self.logger.debug('-- Sakura Response --\n' + response + '\n\n')

问了下chatgpt说

这个错误是由于尝试将非法的多字节序列写入流中时引发的。在这种情况下,字符'\u7eeb'无法被'cp950'编码。'cp950'是一个用于编码繁体中文字符的字符集,但显然,它无法处理某些特定的Unicode字符。

解决这个问题的一种方法是更改日志记录器的编码设置,使其能够处理这种字符。你可以在创建StreamHandler时指定编码,如下所示:

import logging
import sys

# Create a stream handler with utf-8 encoding
stream_handler = logging.StreamHandler(stream=sys.stdout)
stream_handler.setFormatter(logging.Formatter('%(message)s'))
stream_handler.encoding = 'utf-8'

# Add the stream handler to the logger
logger = logging.getLogger(__name__)
logger.addHandler(stream_handler)

这样,日志记录器就能够处理更广泛的字符集,包括'\u7eeb'这样的字符。

另一种解决方案是在写入日志之前,先将文本转换为'cp950'编码可以处理的格式。你可以使用Python的encodedecode方法来实现这一点:

text = 'some text with unicode characters'
safe_text = text.encode('cp950', errors='replace').decode('cp950')
logger.debug(safe_text)

在这里,如果字符无法被'cp950'编码,它将被替换为一个特殊的替换字符。

SkywalkerJi commented 4 months ago

原因是因为windows系统语言为繁体中文,翻译输出简体中文时导致logging无法转换字符报错。