dbcli / mycli

A Terminal Client for MySQL with AutoCompletion and Syntax Highlighting.
http://mycli.net
Other
11.42k stars 661 forks source link

'gbk' codec can't decode byte 0xa3 in position 191: illegal multibyte sequence #953

Open starrysky9959 opened 3 years ago

starrysky9959 commented 3 years ago

When I use mycli on Windows10 to load a sql script, for example, source XXX.sql, an error occurred: 'gbk' codec can't decode byte 0xa3 in position 191: illegal multibyte sequence. I thought this is a question about opening a file. According to the log, I modified the code in line 255 in main.py. Then it works. image

gfrlv commented 3 years ago

Hi, thanks for pointing that out. I have not been able to reproduce it on Windows 10, but it's probably down to the locale or some other settings, I don't think it's worth digging really. The better question is whether we should simply add the utf-8 default, or try to detect the file encoding. For example, if the mysql server or database encoding is latin1, and the script is in unicode, should we silently run it or at least warn beforehand?

starrysky9959 commented 3 years ago

Thank you for your reply. It just solved my problem on my PC and I haven't considered comprehensively enough. As you said, trying to detect the file encoding is a better solution.

rolandwalker commented 3 years ago

@pasenor while it is possible to automatically detect the encoding of file contents, that is only true with some limitations. I'd also say it is out-of-scope for mycli, and that too much magic makes a tool hard to predict.

We default to a utf8 connection type, and hopefully soon utf8mb4, so I vote for a UTF-8 default for reading the file.

Whether we should change that default based on the database encoding is a different and interesting question. If the file is in UTF-8, and the database connection type is set to latin1, mycli+mysql should already do the right thing. We could test some scenarios.

gfrlv commented 3 years ago

Yes, I tend to agree that we should not attempt to magically detect the encoding. But I don't know what the "right thing" here should be, even if the connection type is utf-8, but the database encoding isn't. Perhaps we could try to read the file with the encoding specified in the connection and warn the user if it fails. If the user insists, then proceed with the utf-8 default.

Worth testing in any case.

aleimu commented 3 years ago

mark