PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.59k stars 7.77k forks source link

StyleText: UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 64: illegal multibyte sequence #5785

Closed monkeycc closed 1 year ago

monkeycc commented 2 years ago

python tools/synth_dataset.py -c configs/dataset_config.yml 默认 没有改动

(PaddleOCR) PS E:\PaddleOCR\StyleText> python tools/synth_dataset.py -c configs/dataset_config.yml
[2022/03/26 00:08:15] srnet INFO: load pretrained model from style_text_models/bg_generator
[2022/03/26 00:08:17] srnet INFO: load pretrained model from style_text_models/text_generator
[2022/03/26 00:08:17] srnet INFO: load pretrained model from style_text_models/fusion_generator
[2022/03/26 00:08:17] srnet INFO: using FileCorpus
Traceback (most recent call last):
  File "tools/synth_dataset.py", line 31, in <module>
    synth_dataset()
  File "tools/synth_dataset.py", line 26, in synth_dataset
    dataset_synthesiser = DatasetSynthesiser()
  File "E:\PaddleOCR\StyleText\engine\synthesisers.py", line 58, in __init__
    self.style_sampler = style_samplers.DatasetSampler(self.config)
  File "E:\PaddleOCR\StyleText\engine\style_samplers.py", line 27, in __init__
    label_raw = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 64: illegal multibyte sequence
pyramid20002000 commented 1 year ago

我也遇到了同样的问题,有人碰到过吗?

digitalboy commented 1 year ago

any one can tell something?

HappyBruce1 commented 1 year ago

我也是这个问题,希望早点解决

HappyBruce1 commented 1 year ago

强制用utf-8可以但是生成的中文有问题