Closed caomingpei closed 2 months ago
Welcome to this project and thank you!' first issue
I did a quick test by opening a GBK
encoded text file and forcing the encoding to UTF-8
like so:
>>> f = open('gbk.crlf.txt', 'r', encoding='utf-8')
>>> f
<_io.TextIOWrapper name='gbk.crlf.txt' mode='r' encoding='utf-8'>
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 0: invalid start byte
Here is the sample file I used, with characters from the GBK
set:
https://raw.githubusercontent.com/x1angli/cvt2utf/master/sample_data/gbk.crlf.txt
There is a proposed fix in PR #228 that enforces the encoding to UTF-8
when opening files, the same way as the example snippet above. However, is that fix really a viable solution considering the error raised when trying to read the sample file encoded in GBK
? It seems to be a workaround that works for this particular user, and may cause unpredictable issues for others.
I am using Python 3.12.3.
I am a developer from China. And My develop environment is Win11. When I trying using mkdocs and this macros-plugin, I find a issue. The command reports that the encoding= 'cp936'. First, I think it may be the defalut set by the CMD or PowerShell, but I change it and this doesn't work. Then I check the report, and find the following message (Other message is omitted for simplify):
The root cause is the 'gbk' encoding. However, the first trace shows that this is because mkdocs-macros-plugin open the file without setting the utf-8 encoding.