Open TheVeryDarkness opened 1 year ago
Sorry, the log of error is hard to read. It should be
Traceback (most recent call last):
File "D:\scoop\apps\python39\current\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\scoop\apps\python39\current\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\scoop\apps\python39\current\Scripts\antlr4-parse.exe\__main__.py", line 7, in <module>
File "D:\scoop\apps\python39\current\lib\site-packages\antlr4_tool_runner.py", line 153, in interp
err = err.decode("UTF-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 198: invalid continuation byte
have you tried just reading the file directly in Python, specifying utf-8 encoding? I suspect that your input file is not actually UTF-F 8.
On Tue, May 23, 2023 at 4:20 PM TheVeryDarkness @.***> wrote:
Sorry, the log of error is hard to read. It should be
Traceback (most recent call last): File "D:\scoop\apps\python39\current\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\scoop\apps\python39\current\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\scoop\apps\python39\current\Scripts\antlr4-parse.exe__main__.py", line 7, in
File "D:\scoop\apps\python39\current\lib\site-packages\antlr4_tool_runner.py", line 153, in interp err = err.decode("UTF-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 198: invalid continuation byte — Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4282#issuecomment-1558769205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMHVX2RH4VDR7Z55D6LXHRXLTANCNFSM6AAAAAAYLQIEIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
have you tried just reading the file directly in Python, specifying utf-8 encoding? I suspect that your input file is not actually UTF-F 8.
On Tue, May 23, 2023 at 4:20 PM TheVeryDarkness @.***> wrote:
Sorry, the log of error is hard to read. It should be
Traceback (most recent call last): File "D:\scoop\apps\python39\current\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\scoop\apps\python39\current\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\scoop\apps\python39\current\Scripts\antlr4-parse.exe__main__.py", line 7, in
File "D:\scoop\apps\python39\current\lib\site-packages\antlr4_tool_runner.py", line 153, in interp err = err.decode("UTF-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 198: invalid continuation byte — Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/4282#issuecomment-1558769205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMHVX2RH4VDR7Z55D6LXHRXLTANCNFSM6AAAAAAYLQIEIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for your reply. And wait a minute, I'll post the decoding result later what you suggested. But the error occurred when decoding the output of popen() but not my input. It seems the sub-process reads UTF-8 but writes GBK.
The file can be read successfully as it shows below:
>>> open("test.txt", encoding="UTF-8").read()
'任意的 Unicode 字符'
And the error might occur at one of the last 2 lines below (a part of function interp in antlr4_tool_runner.py):
p = subprocess.Popen([java, '-cp', jar, 'org.antlr.v4.gui.Interpreter']+args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
out = out.decode("UTF-8")
err = err.decode("UTF-8")
So as I've tried chcp 65001
, I'm wondering why popen() keeps giving outputs in GBK.
When using antlr-parse to parse a file in utf-8 with encoding set to utf-8, an error occurred like below:
It seems the actual encoding of output is gbk, while it's decoded with utf-8. I've tried chcp(I'm using Windows), but the error remains.
An example grammar file is
An example input file is
Command is