dbcli / mycli

A Terminal Client for MySQL with AutoCompletion and Syntax Highlighting.
http://mycli.net
Other
11.5k stars 664 forks source link

Unicode error in Chinook sample database #670

Open thorstenkampe opened 6 years ago

thorstenkampe commented 6 years ago

Mycli 1.18.0, Python 2.7.15

The following query from the Chinook sample database (https://github.com/lerocha/chinook-database) for SQL Server, Oracle, MySQL, PostgreSQL generates an error:

select FirstName, LastName, Company, Address from `Customer` LIMIT 5
'charmap' codec can't encode character u'\u0161' in position 813: character maps to <undefined>

This is the data in question:

FirstName,LastName,Company,Address
Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170"
Leonie,Köhler,,Theodor-Heuss-Straße 34
François,Tremblay,,1498 rue Bélanger
Bjørn,Hansen,,Ullevålsveien 14
František,Wichterlová,JetBrains s.r.o.,Klanova 9/506
tsroten commented 6 years ago

@thorstenkampe Hello! It looks like it might be an issue with the character set your server is running in as I am able to run the same query on a database:

localhost:test_db> select * from Customer limit 5;

+-----------+-------------+--------------------------------------------------+---------------------------------+
| FirstName | LastName    | Company                                          | Address                         |
+-----------+-------------+--------------------------------------------------+---------------------------------+
| Luís      | Gonçalves   | Embraer - Empresa Brasileira de Aeronáutica S.A. | Av. Brigadeiro Faria Lima, 2170 |
| Leonie    | Köhler      |                                                  | Theodor-Heuss-Straße 34         |
| François  | Tremblay    |                                                  | 1498 rue Bélanger               |
| Bjørn     | Hansen      |                                                  | Ullevålsveien 14                |
| František | Wichterlová | JetBrains s.r.o.                                 | Klanova 9/506                   |
+-----------+-------------+--------------------------------------------------+---------------------------------+

Can you run the command status in mycli? Here is an example output:

localhost:test_db> status
--------------
mycli 1.18.1, running on CPython 3.6.4

+----------------------+----------------------------------+
| Connection id:       | 2773                             |
| Current database:    | test_db                          |
| Current user:        | root@localhost                   |
| Current pager:       | System default                   |
| Server version:      | 5.7.21-0ubuntu0.17.10.1 (Ubuntu) |
| Protocol version:    | 10                               |
| Connection:          | Localhost via UNIX socket        |
| Server characterset: | latin1                           |
| Db characterset:     | latin1                           |
| Client characterset: | utf8mb4                          |
| Conn. characterset:  | utf8mb4                          |
| UNIX socket:         | /var/run/mysqld/mysqld.sock      |
| Uptime:              | 161 days 6 hours 31 min 3 sec    |
+----------------------+----------------------------------+
thorstenkampe commented 6 years ago

@tsroten the issue is with Python2. The same query works with mycli from Python3. Unfortunately I had to switch to Python2 because PromptToolkit was upgraded to version 2 on Python3.

Python2 is mostly for mssql-cli, litecli and mycli which haven't been upgraded to Python3 yet.

By the way: mssql-cli - which is closer to pgcli - running on Python2 has no problem with the customer select.

thorstenkampe commented 6 years ago
mycli 1.18.0, running on CPython 2.7.15

+----------------------+-------------------------------------+
| Connection id:       | 32                                  |
| Current database:    | Chinook                             |
| Current user:        | root@thorsten                       |
| Current pager:       | pspg -Ff                            |
| Server version:      | 8.0.13 MySQL Community Server - GPL |
| Protocol version:    | 10                                  |
| Connection:          | dbdemo via TCP/IP                   |
| Server characterset: | utf8mb4                             |
| Db characterset:     | utf8mb4                             |
| Client characterset: | utf8                                |
| Conn. characterset:  | utf8                                |
| TCP port:            | 3306                                |
| Uptime:              | 31 min 30 sec                       |
+----------------------+-------------------------------------+
tsroten commented 6 years ago

@thorstenkampe So, I'm still able to successfully select from this table on Python 2.7 (although on Linux, not Windows). Do you think you could provide some additional information from your mycli log file to help diagnose the issue?

You can increase the logging verbosity to DEBUG via the log_level config. Seeing the stack trace for the error would be helpful to identify in which library/file the error is occuring.

thorstenkampe commented 6 years ago

Please find attached the debug log dbcli [1].log

tsroten commented 6 years ago

@thorstenkampe I finally installed Python 2 on my Windows computer and reproduced the error with a minimal example. I also have a workaround for you.

Here is the example that breaks:

PS D:\Programs\Python2.7> ./python.exe
>>> print(u'Frant\u0161fek')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Programs\Python2.7\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0161' in position 5: character maps to <undefined>

Here is a fix:

PS D:\Programs\Python2.7> chcp 65001
Active code page: 65001
PS D:\Programs\Python2.7> $env:PYTHONIOENCODING = "UTF-8"
PS D:\Programs\Python2.7> ./python.exe
>>> print(u'Frant\u0161fek')
Frantšfek

So, after setting your code page to 65001, you'll want to set the environment variable (Powershell: $env:PYTHONIOENCODING = "UTF-8", Non-powershell: set PYTHONIOENCODING=utf-8)

This all seems related to a bug in Python with Windows: https://bugs.python.org/issue1602

thorstenkampe commented 6 years ago

@tsroten , I'm aware of the chcp and PYTHONIOENCODING fixes. mssql-cli sets the latter in its batch wrapper. Setting the code page permanently is not possible and changing PYTHONIOENCODING globally may introduce side effects.

Anyway, the new version of mycli is compatible with prompt-toolkit 2 so I switched back to Python3.

Please feel free to close this ticket.

Axolord commented 3 years ago

I am having the same problem, but it gets a little weirder. So I am on Windows 10, have Python 3.5.4 (work computer, cannot update) and mycli 1.21.1.

With the above mentioned environment variable set, printing unicode works just fine:

PS C:\Users\Lehrstuhl_Inderst> python
Python 3.5.4 |Anaconda 4.2.0 (64-bit)| (default, Nov  8 2017, 14:34:30) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("u'Frant\u2019'")
u'Frant’'

But in mycli, when trying to query my database I get:

mysql root@localhost:supermarkets> select link from products where source="Spar" order by timestamp desc limit 100
'charmap' codec can't encode character '\u2019' in position 3126: character maps to <undefined>

The same query works fine in the mysql cli client.

If needed I can further investigate the issue and try to find the exact line that causes trouble, so people can try to reproduce the issue.