apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.62k stars 13.82k forks source link

chinese characters within sqllab gets converted to question mark #2996

Closed timfeirg closed 6 years ago

timfeirg commented 7 years ago

Environment

superset 18.4 and superset 18.5-alpha.3 python 3.6.1

SQLLab setup

I'm using pyhive, the sqlalchemy uri is hive://xxx.xxx.xxx.xxx:10000/?auth=NONE&username=hive

Expected results

this sql will execute correctly:

select * from rhllor_ods.subproduct where short_name like '%起泡%'

Actual results

while superset web server shows the correct sql:

2017-06-20 13:29:11,872:INFO:root:Parsing with sqlparse statement select * from rhllor_ods.subproduct where short_name like '%起泡%'

the actual sql that gets passed to superset worker is this:

[2017-06-20 13:29:12,139: INFO/Worker-1] Running query:
select * from rhllor_ods.subproduct where short_name like '%??%'

Steps to reproduce

run the above sql.

timfeirg commented 7 years ago

I've read some related issues including https://github.com/airbnb/superset/issues/485, https://github.com/airbnb/superset/issues/2473 these issues seems only related to mysql backends, not pyhive.

xrmx commented 7 years ago

If it's mysql related please show your connection string, are you using a proper encoding?

timfeirg commented 7 years ago

I'm using pyhive @xrms

mistercrunch commented 7 years ago

I'm planning on testing whether using py3 helps with that.

timfeirg commented 7 years ago

I'm using python 3, description has been updated. @mistercrunch

fufjvnvnf commented 7 years ago

I have a similar on my remote server, which uses Ubuntu, but my local MacOS machine does not have this issue. Both machines use python 3.6. I suspect this is a mysql related problem. I've already converted both machines' mysql tables to utf8 charset. This question mark issue for me also occur to columns of type Text, meaning for all description columns for datasources and columns, all chinese characters are converted to ???. All VARCHAR columns work with utf8 chars correctly though.

mistercrunch commented 6 years ago

Notice: this issue has been closed because it has been inactive for 283 days. Feel free to comment and request for this issue to be reopened.