eihli / image-table-ocr

Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
MIT License
497 stars 109 forks source link

UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 15: ordinal not in range(128) #13

Open chouroukhelaoui opened 2 years ago

chouroukhelaoui commented 2 years ago

Traceback (most recent call last): File "/opt/anaconda3/envs/Hyper-Table-Recognition/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/anaconda3/envs/Hyper-Table-Recognition/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/chouroukhelaoui/PycharmProjects/image-table-ocr/table_ocr/demo/main.py", line 51, in csv_output = main(sys.argv[1]) File "/Users/chouroukhelaoui/PycharmProjects/image-table-ocr/table_ocr/demo/main.py", line 34, in main for cell in cells File "/Users/chouroukhelaoui/PycharmProjects/image-table-ocr/table_ocr/demo/main.py", line 34, in for cell in cells File "/Users/chouroukhelaoui/PycharmProjects/image-table-ocr/table_ocr/ocr_image/init.py", line 33, in main txt_file.write(txt) UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 15: ordinal not in range(128)

in some cases, we get this issue it can't be fixed by adding this line of code in "/image-table-ocr/table_ocr/ocr_image /init.py" line 32 :

txt = txt.encode('ascii', 'ignore').decode('ascii')

rucxiaowen commented 2 years ago

你好!邮件已收到,谢谢!

eihli commented 1 year ago

I think this might be because open uses your locale's encoding when you don't specify one as a kwarg in the open call and your locale's encoding is ASCII.

https://docs.python.org/3/library/functions.html#open https://docs.python.org/3/library/locale.html#locale.getencoding

If I'm right, I don't think this requires a code change since it's adjustable at the environment-level.