UnicodeDecodeError: 'gbk' codec can't decode byte (problem with Chinese characters)

amyreese / markdown-pp

Preprocessor for Markdown files to generate a table of contents and other documentation needs

MIT License

309 stars 68 forks source link

UnicodeDecodeError: 'gbk' codec can't decode byte (problem with Chinese characters) #76

Open kmcbest opened 4 years ago

kmcbest commented 4 years ago

Test example:

example.zip

If I use markdown-pp index.mdpp -o out.md on these two files, markdown-pp throws this error:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence

append "-e latexrender" doesn't work for this case.

amyreese commented 4 years ago

What version of Python are you using? markdown-pp only supports unicode documents in Python 3.

kmcbest commented 4 years ago

What version of Python are you using? markdown-pp only supports unicode documents in Python 3.

Python 3.8.2, my example files are in UTF-8 without BOM.

amyreese commented 4 years ago

The project tries to read files with the default encoding used by Python. If your system uses a locale that specifies encodings other than UTF-8, then it's going to fail on decoding the contents of a UTF-8 file. You can override the system locale by specifying the appropriate environment values, and you can test the default encoding with locale.getpreferredencoding().