Closed philip4096 closed 8 years ago
Hmm, yes, I'm seeing this stuff, too, with these packages and Julia 0.3.1 (x86_64-w64-mingw32) on Windows 7 64-bit:
julia> Pkg.add("ODBC")
INFO: Installing ArrayViews v0.4.6
INFO: Installing DataArrays v0.2.5
INFO: Installing DataFrames v0.5.11
INFO: Installing Dates v0.3.2
INFO: Installing GZip v0.2.13
INFO: Installing ODBC v0.3.10
INFO: Installing Reexport v0.0.2
INFO: Installing SortingAlgorithms v0.0.2
INFO: Installing StatsBase v0.6.8
INFO: Package database updated
Including, for example, errors like this one:
[ODBC] 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '%'.
ERROR: [ODBC]: SQLExecDirect failed; Return Code: SQL_DEFAULT_ERROR
in error at error.jl:21
in query at C:\Users\kevin.cantu\.julia\v0.3\ODBC\src\userfacing.jl:46
while loading C:\code\neon\src-jl\experimental.jl, in expression starting on line 6
When there is no %
in my UTF-8 source flie at all:
using DataFrames
using ODBC
ODBC.connect("sample")
resultset = ODBC.query("SELECT job.id FROM job")
And my ODBC data source is something like this:
Microsoft SQL Server ODBC Driver Version 06.01.7601
Data Source Name: sample
Data Source Description: DTM jobs database
Server: sample.local
Database: DTMJobs
Language: (Default)
Translate Character Data: Yes
Log Long Running Queries: No
Log Driver Statistics: No
Use Regional Settings: No
Prepared Statements Option: Drop temporary procedures on disconnect
Use Failover Server: No
Use ANSI Quoted Identifiers: Yes
Use ANSI Null, Paddings and Warnings: Yes
Data Encryption: No
Good, I am not the only one. I am also using Windows 7 64 bit, but 32 bit Julia. Looks like the query gets messed up somehow?
Yeah, like it's getting lost in 8-bit <--> 16-bit character conversion, or something.
I'm also seeing this and trying to debug it. My suspicion is that there's a disconnect between the ODBC driver expecting to get UTF-16 and the calling code sending UTF-8. Unfortunately, the Microsoft ODBC documentation (example) doesn't seem to even mention Unicode or clarify what the SQLCHAR
type actually is and I somehow doubt that "Length of *StatementText in characters" is actually accurate – you generally want to give the length in bytes, not characters, since Unicode characters are variable width.
In any case, this code is currently wrong since it sends very different data for strings with different encodings – and only one of them can be correct based on whatever the ODBC API expects.
This Unicode/ODBC is an unholy mess:
How can anyone deal with this and not lose their mind?
More info: https://blogs.oracle.com/mysqlconnectors-ODBC/entry/common_problems_and_mistakes_when
tl;dr: I'm pretty sure that some mismatch in terms of what ODBC.jl is sending and what the driver is expecting is causing memory corruption and/or misinterpretation of queries as junk due to the wrong encoding. This may be driver-specific – i.e. we may need to handle drivers that expect UTF-8 as well as those that expect UTF-16.
Really sorry guys that I haven't been more responsive here (the last 2 months in particular have been crazy, but only two more finals to go!). In terms of the UTF-8 and UTF-16, I know this has been a headache before, because there are crazy interactions between OS driver (iodbc, unixODBC, windows odbc32), DBMS driver, and the package. Some lines of code to check out are here. It'd be great to know everyone's OS and DB they're working with.
Unfortunately, I'm a little slammed until next Tuesday when my last final is over and then packing up and moving across the country, but I'll try to make this a priority to help out with.
Hey, no problem, good luck with finals, in the meantime!
@quinnj, I can handle fixing this if you can give me some guidance about how you decided what arguments should be passed as ASCII vs UTF-16.
I'm by no means an expert on this, but I believe as of now, all arguments are being passed as ASCII. There are UTF-16 functions, defined here, but they aren't wrapped in the ODBC_API.jl file, so I believe it defaults to whatever option was specified in compiling the ODBC driver (see line 771 of the linked file above).
Here are the other ODBC header files for reference: -http://www.opensource.apple.com/source/iodbc/iodbc-36/iodbc/include/sqlext.h -http://www.opensource.apple.com/source/iodbc/iodbc-42.2/iodbc/include/sqltypes.h -http://www.opensource.apple.com/source/iodbc/iodbc-34/iodbc/include/sqlucode.h -http://www.opensource.apple.com/source/iodbc/iodbc-34/iodbc/include/sql.h
Again, I may be way off here since I never got around to really digging into the interactions here.
As far as UTF-16, I believe the only thing currently supported in ODBC is returning a UTF-16 string type from a DBMS as the result of a query.
This line converts to UTF-16 and then passes the code to this function, which seems to use it as ASCII. I'm not sure how that could cause memory corruption, however, since the byte length is still correct.
This line converts to UTF-16
Ah! I wasn't even aware we did that. I believe that came in on one of @jakebolewski's codebase reworkings. To be fair, I believe it still works on most systems, but perhaps there's a mismatch for certain configurations.
Unfortunately I tried containing all the API functions to only accept UTF-8 strings and removing this transcoding call but that didn't fix the problem. I guess I could try harder but it seems like I'm just failing around – I would really like to have a better understanding of which things should be in which encodings.
This is not necessarily a problem, but it amused me just now that the iODBC.org website notes the latest version of Python as 2.2.3...
@StefanKarpinski this MSDN doc describes the C Data Types better, I think. And there's a lot more in the rest of this appendix. (Found that via this summary of ODBC 3.8, for Windows 7+.)
There seem to be lots of dark corners (e.g., this or this) which might be relevant.
Ahh, I discovered that with odbcad32.exe
I have a tab to trace ODBC. So it's really plain to see that on my system, running this Julia code with a 22-character query:
using DataFrames
using ODBC
ODBC.connect("dtm")
resultset = ODBC.query("SELECT job.id FROM job")
Causes this to read twice the characters needed:
julia experime 1368-11e0 ENTER SQLExecDirect
HSTMT 0x0000000006EDBAF0
UCHAR * 0x00000000203FCFD0 [ 44] "SELECT job.id FROM job\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0@Ð? \ 0\ 0\ 0\ 0 Ð? "
SDWORD 44
julia experime 1368-11e0 EXIT SQLExecDirect with return code -1 (SQL_ERROR)
HSTMT 0x0000000006EDBAF0
UCHAR * 0x00000000203FCFD0 [ 44] "SELECT job.id FROM job\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0@Ð? \ 0\ 0\ 0\ 0 Ð? "
SDWORD 44
DIAG [07002] [Microsoft][ODBC SQL Server Driver]COUNT field incorrect or syntax error (0)
(At least with commit 65ed481c112a of ODBC.jl.)
So, we give SQLExecDirect
a SQLCHAR *
(ODBC) or unsigned char *
(C) which is OK, but specify 2x length?
And that call is this:
function ODBCQueryExecute(stmt::Ptr{Void}, querystring::String)
if @FAILED SQLExecDirect(stmt, utf16(querystring))
ODBCError(SQL_HANDLE_STMT,stmt)
error("[ODBC]: SQLExecDirect failed; Return Code: $ret")
end
end
Which calls this?
function SQLExecDirect(stmt::Ptr{Void},query::String)
@windows_only ret = ccall( (:SQLExecDirect, odbc_dm), stdcall,
Int16, (Ptr{Void},Ptr{Uint8},Int),
stmt,query,sizeof(query))
@unix_only ret = ...
return ret
end
So, to fix that, at least, we just have to remove the utf16(...)
from ODBCQueryExecute
and now it works!
But I've not handled the returned values or anything more complicated yet.
Hmm. I thought I'd tried that and was still seeing random memory corruption. But I can try again.
Yeah, it ends up looking OK in the trace:
julia experime 1a4c-1aac ENTER SQLExecDirect
HSTMT 0x00000000070AF790
UCHAR * 0x000000001EE79EE0 [ 22] "SELECT job.id FROM job"
SDWORD 22
julia experime 1a4c-1aac EXIT SQLExecDirect with return code 0 (SQL_SUCCESS)
HSTMT 0x00000000070AF790
UCHAR * 0x000000001EE79EE0 [ 22] "SELECT job.id FROM job"
SDWORD 22
Why, though, is intriguing. Does utf16(...)
malfunction? Does ccall(...)
cast the query back into ASCII/UTF-8? Why does sizeof(...)
give 2x length? I'll look into more later and do a pull req.
Length gives the number of characters whereas sizeof gives the number of bytes.
Does removing the utf16(...) from ODBCQueryExecute make it work?
I'm seeing similar errors trying to query a MS SQL server from windows. It never works, but the error message changes. (tables and fields anonymised, but are all ascii)
julia> sql"SELECT TOP 1000 [gggg]
,[iiiii]
,[dddd]
FROM [Xp].[dbo].[ssss]"
ERROR: invalid UTF-8 sequence
julia> sql"SELECT TOP 1000 [gggg]
,[iiiii]
,[dddd]
FROM [Xp].[dbo].[ssss]"
ERROR: invalid UTF-8 sequence
julia> sql"SELECT TOP 1000 [ggg]
,[iiii]
,[dddd]
FROM [Xp].[dbo].[ssss]"
[ODBC] 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near 'P'.
ERROR: [ODBC]: SQLExecDirect failed; Return Code: SQL_DEFAULT_ERROR
I'm having the same issues. I'm on 64-bit Windows 7, 64-bit Julia, and my ODBC connection to the SQL Server database was set up using a 64-bit DSN. The connection tests OK, and I get correct results from trivial queries such as "SELECT 1" and "SELECT GETDATE()" but as soon as I reference a table in the database I get "invalid UTF-8 sequence in convert at utf8.jl:163". If I run the query repeatedly it seems to switch arbitrarily between this error and an error reference a syntax issue near a character not in my query (for example, [ODBC] 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '>'.).
I'm also having the same error as tristanmarkwell and with same configuration. The easy queries do work but I'm not getting any real data.
Any ideas?
https://github.com/quinnj/ODBC.jl/pull/71 fixed it for me
I tried to implement #71 (unfortunately due to my lack of understanding of how to use a specific branch of a package I just replaced the text of the 5 files in the ODBC folder with Stefan's text). However, I'm now getting the opposite error: invalid ASCII sequence in convert at ascii.jl:101. I don't get the syntax errors any longer. Nothing in my query uses symbols not on the keyboard. I hope it's not part of the issue but I'm running this through Juno since I'm struggling with the Julia-using-a-network-drive-to-store-packages issue with the REPL.
I can confirm that removing the utf16(...) from ODBCQueryExecute in backend.jl line 44 makes it work. I am on Windows 10, 64 bit., Julia 0.4.1
(I did not have this problem when I wrote "Getting started with Julia Programming", with Julia v 0.3)
See https://groups.google.com/forum/#!topic/julia-users/BOeyuSlzCXE where much of this situation is hashed out.
should be resolved by the overhaul 7ae7ebfac60ed3ffb1ccefd552993a3073f8b77b; please re-open if you're still seeing issues
I'm struggling with the same issue on Linux (ODBC.jl v. 1.0.4). Firebird ODBC driver accepts only UTF-8 strings but seems like ODBC.jl provides UTF-16. Is there any workaround presently?
gives me random errors such as:
or
Sometimes it gives the expected result. Any ideas? I am running julia v0.3.1.