JuliaDatabases / ODBC.jl

An ODBC interface for the Julia programming language
https://odbc.juliadatabases.org/stable
Other
106 stars 62 forks source link

Unicode issues with Impala #291

Closed Arkoniak closed 4 years ago

Arkoniak commented 4 years ago

I am trying to execute simple query in Impala and get weird errors.

using ODBC
ODBC.adddriver("Impala", "/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so")
cstr = "DRIVER=Impala;HOST=<host>;PORT=<port>;UID=<uid>;PWD=<password>;AuthMech=3;SSL=0;"
conn = ODBC.Connection(cstr)
DBInterface.execute(conn, "SELECT 42")

ERROR: 42000: [Cloudera][ImpalaODBC] (360) Syntax error occurred during query execution: [H
Y000] : AnalysisException: Syntax error in line 1:
�����罝
^
Encountered: Unexpected character
Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT, INSERT, INVALIDAT
E, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE, UPDATE, UPSERT, USE, VALUES, WITH

CAUSED BY: Exception: Syntax error
P, EXP

I am using the same connection string in pyodbc and it works just fine, so it seems like something is different in python and Julia strings and how driver interpret them. I was thinking about using StringEncodings.jl, but encode function returns Vector{UInt8} and I wasn't able to figure out what to do next.

OS: Ubuntu 18.04.4 LTS Impala driver: 2.6.4

quinnj commented 4 years ago

Looking into this; looks like I don't see any issues on OSX; going to try on linux now.

quinnj commented 4 years ago

Ok, I tracked down the issue (for me at least) on linux:

Screen Shot 2020-06-05 at 12 03 37 PM

the docs mention that the driver is configurable between UTF-32 and UTF-16, with the default being UTF-32 for some reason (even though default unixODBC is UTF-16). So editing /opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini and changing the line to DriverManagerEncoding=UTF-16 now everything works fine.

These kind of driver-specific .ini files are annoying because they're like hidden configuration we can't quite control and are not aware of. I'm going to start a "troubleshooting" section to the docs where we can collect general strategies for debugging these kinds of issues and list specific cases and what to do.

Arkoniak commented 4 years ago

Thank you very much! This is amazing work, I can't even guess how you were able to figure it out.

Only one question: is it possible to override this behaviour without changing "cloudera.impalaodbc.ini" file? I've tried to add DriverManagerEncoding to connection string, but it has no effect. The reason why I ask is that it is not always possible to edit "/opt" files, and also it is something that one can easily forget with the driver upgrade.

quinnj commented 4 years ago

Hmmmm, I tried all the tricks I know, but it doesn't seem like there's a way to override it w/o directly editing the file. Sorry. Some of these drivers can be such a pain.