fralau / mkdocs-macros-plugin

Create richer and more beautiful pages in MkDocs, by using variables and calls to macros in the markdown code.
https://mkdocs-macros-plugin.readthedocs.io
Other
318 stars 50 forks source link

[Report] default encoding issue for other character set #227

Closed caomingpei closed 2 months ago

caomingpei commented 3 months ago

I am a developer from China. And My develop environment is Win11. When I trying using mkdocs and this macros-plugin, I find a issue. The command reports that the encoding= 'cp936'. First, I think it may be the defalut set by the CMD or PowerShell, but I change it and this doesn't work. Then I check the report, and find the following message (Other message is omitted for simplify):

│ C:\anaconda3\envs\mkdoc\Lib\site-packages\mkdocs_macros\plugin.py:352 in _load_yaml              │
│                                                                                                  │
│   349 │   │   │   │   with open(filename) as f:                                                  │
│   350 │   │   │   │   │   # load the yaml file                                                   │
│   351 │   │   │   │   │   # NOTE: for the SafeLoader argument, see: https://github.com/yaml/py   │
│ ❱ 352 │   │   │   │   │   content = yaml.load(f, Loader=yaml.SafeLoader)                         │
│   353 │   │   │   │   │   trace("Loading yaml file:", filename)                                  │
│   354 │   │   │   │   if key is not None:                                                        │
│   355 │   │   │   │   │   content = {key: content}                                               │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │       el = {'external_links': '../en/data/external_links.yml'}                               │ │
│ │        f = <_io.TextIOWrapper name='D:\\fastapi\\docs\\zh\\../en/data/external_links.yml'    │ │
│ │            mode='r' encoding='cp936'>                                                        │ │
│ │ filename = 'D:\\fastapi\\docs\\zh\\../en/data/external_links.yml'                            │ │
│ │      key = 'external_links'                                                                  │ │
│ │     self = <mkdocs_macros.plugin.MacrosPlugin object at 0x00000122F53C20C0>                  │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ C:\anaconda3\envs\mkdoc\Lib\site-packages\yaml\__init__.py:79 in load                            │
│                                                                                                  │
│    76 │   Parse the first YAML document in a stream                                              │
│    77 │   and produce the corresponding Python object.                                           │
│    78 │   """                                                                                    │
│ ❱  79 │   loader = Loader(stream)                                                                │
│    80 │   try:                                                                                   │
│    81 │   │   return loader.get_single_data()                                                    │
│    82 │   finally:                                                                               │
│                                                                                                  │
....................................... OTHER ERROR MESSAGE.........................................
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 1165: illegal multibyte sequence

Traceback (most recent call last):
  File "D:\mkdocs-macros-plugin\setup.py", line 36, in <module>
    long_description=read_file('README.md'),
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\mkdocs-macros-plugin\setup.py", line 29, in read_file
    return open(os.path.join(os.path.dirname(__file__), fname)).read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa5 in position 667: illegal multibyte sequence

The root cause is the 'gbk' encoding. However, the first trace shows that this is because mkdocs-macros-plugin open the file without setting the utf-8 encoding.

github-actions[bot] commented 3 months ago

Welcome to this project and thank you!' first issue

kchr commented 2 months ago

I did a quick test by opening a GBK encoded text file and forcing the encoding to UTF-8 like so:

>>> f = open('gbk.crlf.txt', 'r', encoding='utf-8')
>>> f
<_io.TextIOWrapper name='gbk.crlf.txt' mode='r' encoding='utf-8'>
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 0: invalid start byte

Here is the sample file I used, with characters from the GBK set:

https://raw.githubusercontent.com/x1angli/cvt2utf/master/sample_data/gbk.crlf.txt

There is a proposed fix in PR #228 that enforces the encoding to UTF-8 when opening files, the same way as the example snippet above. However, is that fix really a viable solution considering the error raised when trying to read the sample file encoded in GBK? It seems to be a workaround that works for this particular user, and may cause unpredictable issues for others.

I am using Python 3.12.3.