Closed GoogleCodeExporter closed 9 years ago
The string u'\u4141\u4141' indicates that a UTF-8 string is being received when
UTF-16 is expected:
>>> u'AAAA'.encode('UTF-8').decode('UTF-16')
u'\u4141\u4141'
This would happen in the following scenario:
1) Netezza ODBC driver reports the column's SQL type to be "wvarchar"
(represented by the ODBC contant SQL_WVARCHAR)
2) pyodbc would then request to read the column data as the C language type
"wchar_t*" (represented by the ODBC constant SQL_C_WCHAR)
3) Netezza ODBC driver then returns UTF-8 data (wrong! wchar_t* in C is UTF-16
or UCS4 depending on compiler/platform, never UTF-8)
While step 3 should be regarded as a bug in the Netezza ODBC driver, I wonder
if this is configurable behaviour: looking at Netezza's sample odbcinst.ini
shown in this forum post:
http://www.netezzacommunity.com/message/5801
... I see the following option:
UnicodeTranslationOption = utf8
Do you have a setting like this? I'd suggest changing this to "utf16".
Original comment by lukedell...@gmail.com
on 22 Mar 2011 at 1:20
Hey, you got it exactly right. After I changed the setting
UnicodeTranslationOption = utf16, nvarchar data does show up correctly.
However, would this cause any other issues with other type of data that might
actually need to be translated using utf8, instead of utf16?
Thanks
Original comment by jbao...@gmail.com
on 22 Mar 2011 at 1:58
I do not expect you'd have problems with other data types. My guess is that
this option attempts to cater for non-unicode-aware clients, but if you're only
using pyodbc as a client then setting this to utf16 should be fine.
Original comment by lukedell...@gmail.com
on 22 Mar 2011 at 3:58
Original comment by mkleehammer
on 20 May 2011 at 7:47
I just looked at this issue more closely.
What the Netezza ODBC driver is doing is returning SQL_WVARCHAR regardless of
whether it is using UTF-8 or UTF-16 encoding, and regardless of whether the
result is unicode.
In PYODBC, there actually is support for receiving Unicode through UTF-8,
however the switch statement in GetDataString() in getdata.cpp that does this
only checks the flag 'unicode_results' when the driver had reported SQL_CHAR,
SQL_VARCHAR, SQL_LONGVARCHAR, SQL_GUID, and SQL_SS_XML. If I move SQL_WVARCHAR
up into that section of the switch statement, then this bug goes away.
However, the whole logic section only seems to apply for Python < 3.0 (not sure
why that is; does Python 3.0 somehow figure out Unicode on its own?)
Anyway, attached is my patched getdata.cpp which fixes this problem. The most
important question, however, is does this work for all ODBC drivers?
Original comment by cpar...@gmail.com
on 19 Jan 2012 at 9:10
Attachments:
The fundamental problem is the ODBC does *not* support UTF8 or UTF16. The
specification only allows for 1-byte ASCII/latin1 and 2-byte UCS2. The UTF
encodings are actually variable width with a minimum size of 8-bits or 16-bits.
They can be up to 4 or 5 characters wide.
I don't know why driver writers can't understand this. Their job is to convert
whatever the database sends into ASCII or UCS2. Client software, such as
pyodbc, simply cannot guess what encoding is being used -- instead the CHAR vs
WCHAR is how the driver tells the client what to expect.
The unicode_results flag is used to have pyodbc convert actual ANSI results to
Unicode before returning them to the user, not to expect Unicode when the
driver says it is ANSI.
I have considered adding a different setting for an encoding such as UTF8, in
particular so that MySQL users can also tell pyodbc that their driver broken
and will lie about the result type. That may be the fix you need.
In the meantime, if you can configure the driver to use UCS2, it should work.
Is that possible?
Original comment by mkleehammer
on 20 Jan 2012 at 3:11
"Change UnicodeTranslationOption to "utf16" in odbcinst.ini" is actionable.
"Configure the driver to use UCS2" is more of a vision statement. I'm a simple
programmer!
I'll see if we can get IBM to fix the driver, but is pyodbc fixing it? Has a
fix been released?
Original comment by ionFree...@gmail.com
on 27 Jun 2014 at 7:07
Original issue reported on code.google.com by
jbao...@gmail.com
on 21 Mar 2011 at 9:06