boolangery / py-lua-parser

A Lua parser and AST builder written in Python.
MIT License
124 stars 39 forks source link

Parsing fails when the source code contains UTF-8 encoded strings #61

Open finn-cz opened 1 day ago

finn-cz commented 1 day ago

When passing source code containing a string with a UTF-8 encoded character, such as the FULL BLOCK (0xE2 0x96 0x88), Python attempts to open the file using the operating system's default encoding, which might be CP1250 or another random code page. In such cases, file reading fails due to encoding mismatches.

There should be either an option to specify the Lua source code encoding explicitly, or the file should be open with encoding='utf-8', errors='ignore' and the output XML written with encoding='utf-8'.