JuliaDatabases / ODBC.jl

An ODBC interface for the Julia programming language
https://odbc.juliadatabases.org/stable
Other
106 stars 63 forks source link

sporadic errors #67

Closed philip4096 closed 8 years ago

philip4096 commented 9 years ago
con = ODBC.connect("test")
res = query("select 1",con)

gives me random errors such as:

invalid UTF-8 sequence
while loading In[24], in expression starting on line 2

 in convert at utf8.jl:162
 in ODBCMetadata at  ..... \.julia\v0.3\ODBC\src\backend.jl:70

or

[ODBC] 42000: [Microsoft][SQL Server Native Client 10.0][SQL Server]Incorrect syntax near '}'.
[ODBC] 42000: [Microsoft][SQL Server Native Client 10.0][SQL Server]'...' is an invalid name because it contains a NULL character or an invalid unicode character.
[ODBC] 42000: [Microsoft][SQL Server Native Client 10.0][SQL Server]Unclosed quotation mark after the character string '...'.
[ODBC]: SQLExecDirect failed; Return Code: SQL_DEFAULT_ERROR
while loading In[36], in expression starting on line 2

Sometimes it gives the expected result. Any ideas? I am running julia v0.3.1.

killerswan commented 9 years ago

Hmm, yes, I'm seeing this stuff, too, with these packages and Julia 0.3.1 (x86_64-w64-mingw32) on Windows 7 64-bit:

julia> Pkg.add("ODBC")
INFO: Installing ArrayViews v0.4.6
INFO: Installing DataArrays v0.2.5
INFO: Installing DataFrames v0.5.11
INFO: Installing Dates v0.3.2
INFO: Installing GZip v0.2.13
INFO: Installing ODBC v0.3.10
INFO: Installing Reexport v0.0.2
INFO: Installing SortingAlgorithms v0.0.2
INFO: Installing StatsBase v0.6.8
INFO: Package database updated

Including, for example, errors like this one:

[ODBC] 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '%'.
ERROR: [ODBC]: SQLExecDirect failed; Return Code: SQL_DEFAULT_ERROR
 in error at error.jl:21
 in query at C:\Users\kevin.cantu\.julia\v0.3\ODBC\src\userfacing.jl:46
while loading C:\code\neon\src-jl\experimental.jl, in expression starting on line 6

When there is no % in my UTF-8 source flie at all:

using DataFrames
using ODBC

ODBC.connect("sample")
resultset = ODBC.query("SELECT job.id FROM job")

And my ODBC data source is something like this:

Microsoft SQL Server ODBC Driver Version 06.01.7601

Data Source Name: sample
Data Source Description: DTM jobs database
Server: sample.local
Database: DTMJobs
Language: (Default)
Translate Character Data: Yes
Log Long Running Queries: No
Log Driver Statistics: No
Use Regional Settings: No
Prepared Statements Option: Drop temporary procedures on disconnect
Use Failover Server: No
Use ANSI Quoted Identifiers: Yes
Use ANSI Null, Paddings and Warnings: Yes
Data Encryption: No
philip4096 commented 9 years ago

Good, I am not the only one. I am also using Windows 7 64 bit, but 32 bit Julia. Looks like the query gets messed up somehow?

killerswan commented 9 years ago

Yeah, like it's getting lost in 8-bit <--> 16-bit character conversion, or something.

StefanKarpinski commented 9 years ago

I'm also seeing this and trying to debug it. My suspicion is that there's a disconnect between the ODBC driver expecting to get UTF-16 and the calling code sending UTF-8. Unfortunately, the Microsoft ODBC documentation (example) doesn't seem to even mention Unicode or clarify what the SQLCHAR type actually is and I somehow doubt that "Length of *StatementText in characters" is actually accurate – you generally want to give the length in bytes, not characters, since Unicode characters are variable width.

StefanKarpinski commented 9 years ago

In any case, this code is currently wrong since it sends very different data for strings with different encodings – and only one of them can be correct based on whatever the ODBC API expects.

StefanKarpinski commented 9 years ago

This Unicode/ODBC is an unholy mess:

https://www.progress.com/products/datadirect-connect/odbc-drivers/odbc-developer-center/odbc-tutorials/understanding-unicode-and-odbc-data-access/unicode-and-non-unicode-odbc-drivers

How can anyone deal with this and not lose their mind?

StefanKarpinski commented 9 years ago

More info: https://blogs.oracle.com/mysqlconnectors-ODBC/entry/common_problems_and_mistakes_when

tl;dr: I'm pretty sure that some mismatch in terms of what ODBC.jl is sending and what the driver is expecting is causing memory corruption and/or misinterpretation of queries as junk due to the wrong encoding. This may be driver-specific – i.e. we may need to handle drivers that expect UTF-8 as well as those that expect UTF-16.

quinnj commented 9 years ago

Really sorry guys that I haven't been more responsive here (the last 2 months in particular have been crazy, but only two more finals to go!). In terms of the UTF-8 and UTF-16, I know this has been a headache before, because there are crazy interactions between OS driver (iodbc, unixODBC, windows odbc32), DBMS driver, and the package. Some lines of code to check out are here. It'd be great to know everyone's OS and DB they're working with.

Unfortunately, I'm a little slammed until next Tuesday when my last final is over and then packing up and moving across the country, but I'll try to make this a priority to help out with.

killerswan commented 9 years ago

Hey, no problem, good luck with finals, in the meantime!

StefanKarpinski commented 9 years ago

@quinnj, I can handle fixing this if you can give me some guidance about how you decided what arguments should be passed as ASCII vs UTF-16.

quinnj commented 9 years ago

I'm by no means an expert on this, but I believe as of now, all arguments are being passed as ASCII. There are UTF-16 functions, defined here, but they aren't wrapped in the ODBC_API.jl file, so I believe it defaults to whatever option was specified in compiling the ODBC driver (see line 771 of the linked file above).

Here are the other ODBC header files for reference: -http://www.opensource.apple.com/source/iodbc/iodbc-36/iodbc/include/sqlext.h -http://www.opensource.apple.com/source/iodbc/iodbc-42.2/iodbc/include/sqltypes.h -http://www.opensource.apple.com/source/iodbc/iodbc-34/iodbc/include/sqlucode.h -http://www.opensource.apple.com/source/iodbc/iodbc-34/iodbc/include/sql.h

Again, I may be way off here since I never got around to really digging into the interactions here.

As far as UTF-16, I believe the only thing currently supported in ODBC is returning a UTF-16 string type from a DBMS as the result of a query.

StefanKarpinski commented 9 years ago

This line converts to UTF-16 and then passes the code to this function, which seems to use it as ASCII. I'm not sure how that could cause memory corruption, however, since the byte length is still correct.

quinnj commented 9 years ago

This line converts to UTF-16

Ah! I wasn't even aware we did that. I believe that came in on one of @jakebolewski's codebase reworkings. To be fair, I believe it still works on most systems, but perhaps there's a mismatch for certain configurations.

StefanKarpinski commented 9 years ago

Unfortunately I tried containing all the API functions to only accept UTF-8 strings and removing this transcoding call but that didn't fix the problem. I guess I could try harder but it seems like I'm just failing around – I would really like to have a better understanding of which things should be in which encodings.

killerswan commented 9 years ago

This is not necessarily a problem, but it amused me just now that the iODBC.org website notes the latest version of Python as 2.2.3...

killerswan commented 9 years ago

@StefanKarpinski this MSDN doc describes the C Data Types better, I think. And there's a lot more in the rest of this appendix. (Found that via this summary of ODBC 3.8, for Windows 7+.)

There seem to be lots of dark corners (e.g., this or this) which might be relevant.

killerswan commented 9 years ago

Ahh, I discovered that with odbcad32.exe I have a tab to trace ODBC. So it's really plain to see that on my system, running this Julia code with a 22-character query:

using DataFrames
using ODBC
ODBC.connect("dtm")
resultset = ODBC.query("SELECT job.id FROM job")

Causes this to read twice the characters needed:

julia  experime 1368-11e0   ENTER SQLExecDirect 
        HSTMT               0x0000000006EDBAF0
        UCHAR *             0x00000000203FCFD0 [      44] "SELECT job.id FROM job\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0@Ð? \ 0\ 0\ 0\ 0 Ð? "
        SDWORD                    44

julia  experime 1368-11e0   EXIT  SQLExecDirect  with return code -1 (SQL_ERROR)
        HSTMT               0x0000000006EDBAF0
        UCHAR *             0x00000000203FCFD0 [      44] "SELECT job.id FROM job\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0\ 0@Ð? \ 0\ 0\ 0\ 0 Ð? "
        SDWORD                    44

        DIAG [07002] [Microsoft][ODBC SQL Server Driver]COUNT field incorrect or syntax error (0) 

(At least with commit 65ed481c112a of ODBC.jl.)

So, we give SQLExecDirect a SQLCHAR * (ODBC) or unsigned char * (C) which is OK, but specify 2x length?

killerswan commented 9 years ago

And that call is this:

function ODBCQueryExecute(stmt::Ptr{Void}, querystring::String)
    if @FAILED SQLExecDirect(stmt, utf16(querystring))
        ODBCError(SQL_HANDLE_STMT,stmt)
        error("[ODBC]: SQLExecDirect failed; Return Code: $ret")
    end
end

Which calls this?

function SQLExecDirect(stmt::Ptr{Void},query::String)
    @windows_only ret = ccall( (:SQLExecDirect, odbc_dm), stdcall, 
        Int16, (Ptr{Void},Ptr{Uint8},Int),
        stmt,query,sizeof(query)) 
    @unix_only ret = ...
    return ret
end
killerswan commented 9 years ago

So, to fix that, at least, we just have to remove the utf16(...) from ODBCQueryExecute and now it works!

But I've not handled the returned values or anything more complicated yet.

StefanKarpinski commented 9 years ago

Hmm. I thought I'd tried that and was still seeing random memory corruption. But I can try again.

killerswan commented 9 years ago

Yeah, it ends up looking OK in the trace:

julia  experime 1a4c-1aac   ENTER SQLExecDirect 
        HSTMT               0x00000000070AF790
        UCHAR *             0x000000001EE79EE0 [      22] "SELECT job.id FROM job"
        SDWORD                    22

julia  experime 1a4c-1aac   EXIT  SQLExecDirect  with return code 0 (SQL_SUCCESS)
        HSTMT               0x00000000070AF790
        UCHAR *             0x000000001EE79EE0 [      22] "SELECT job.id FROM job"
        SDWORD                    22

Why, though, is intriguing. Does utf16(...) malfunction? Does ccall(...) cast the query back into ASCII/UTF-8? Why does sizeof(...) give 2x length? I'll look into more later and do a pull req.

StefanKarpinski commented 9 years ago

Length gives the number of characters whereas sizeof gives the number of bytes.

philip4096 commented 9 years ago

Does removing the utf16(...) from ODBCQueryExecute make it work?

aviks commented 9 years ago

I'm seeing similar errors trying to query a MS SQL server from windows. It never works, but the error message changes. (tables and fields anonymised, but are all ascii)

julia> sql"SELECT TOP 1000 [gggg]
             ,[iiiii]
             ,[dddd]
         FROM [Xp].[dbo].[ssss]"
ERROR: invalid UTF-8 sequence

julia> sql"SELECT TOP 1000 [gggg]
             ,[iiiii]
             ,[dddd]
         FROM [Xp].[dbo].[ssss]"
ERROR: invalid UTF-8 sequence

julia> sql"SELECT TOP 1000 [ggg]
             ,[iiii]
             ,[dddd]
         FROM [Xp].[dbo].[ssss]"
[ODBC] 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]Incorrect syntax near 'P'.
ERROR: [ODBC]: SQLExecDirect failed; Return Code: SQL_DEFAULT_ERROR
tristanmarkwell commented 9 years ago

I'm having the same issues. I'm on 64-bit Windows 7, 64-bit Julia, and my ODBC connection to the SQL Server database was set up using a 64-bit DSN. The connection tests OK, and I get correct results from trivial queries such as "SELECT 1" and "SELECT GETDATE()" but as soon as I reference a table in the database I get "invalid UTF-8 sequence in convert at utf8.jl:163". If I run the query repeatedly it seems to switch arbitrarily between this error and an error reference a syntax issue near a character not in my query (for example, [ODBC] 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '>'.).

haavardhvarnes commented 9 years ago

I'm also having the same error as tristanmarkwell and with same configuration. The easy queries do work but I'm not getting any real data.

Any ideas?

philip4096 commented 9 years ago

https://github.com/quinnj/ODBC.jl/pull/71 fixed it for me

tristanmarkwell commented 9 years ago

I tried to implement #71 (unfortunately due to my lack of understanding of how to use a specific branch of a package I just replaced the text of the 5 files in the ODBC folder with Stefan's text). However, I'm now getting the opposite error: invalid ASCII sequence in convert at ascii.jl:101. I don't get the syntax errors any longer. Nothing in my query uses symbols not on the keyboard. I hope it's not part of the issue but I'm running this through Juno since I'm struggling with the Julia-using-a-network-drive-to-store-packages issue with the REPL.

Ivo-Balbaert commented 8 years ago

I can confirm that removing the utf16(...) from ODBCQueryExecute in backend.jl line 44 makes it work. I am on Windows 10, 64 bit., Julia 0.4.1

(I did not have this problem when I wrote "Getting started with Julia Programming", with Julia v 0.3)

StefanKarpinski commented 8 years ago

See https://groups.google.com/forum/#!topic/julia-users/BOeyuSlzCXE where much of this situation is hashed out.

quinnj commented 8 years ago

should be resolved by the overhaul 7ae7ebfac60ed3ffb1ccefd552993a3073f8b77b; please re-open if you're still seeing issues

ax1ine commented 3 years ago

I'm struggling with the same issue on Linux (ODBC.jl v. 1.0.4). Firebird ODBC driver accepts only UTF-8 strings but seems like ODBC.jl provides UTF-16. Is there any workaround presently?