Closed hawkaa closed 2 years ago
I actually fixed it with a small hack: https://github.com/hawkaa/odbc/commit/ae0055c5dfcbf3084f708acd7d6676085826a0db
But I am unsure whether this is a good idea and whether this is something we can make generic.
Hello @hawkaa
When binding columns, a call to [
C.SQLDescribeColW
](is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for the
data
column is 256. This length is then passed on toNewVariableWidthColumn
which rightfully creates a buffer for that column of size 256. When callingrows.Next()
, it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.
Your explanation sounds reasonable. Your ODBC driver uses variable size buffers for SQL_VARCHAR columns, so SQLBindCol ODBC API cannot be used for such columns. SQLBindCol API is memory efficient, because it does not require to allocate buffers every time you read column result.
Your change https://github.com/hawkaa/odbc/commit/ae0055c5dfcbf3084f708acd7d6676085826a0db looks reasonable in your situation. Your approach is sound, it just requires more ODBC calls and more memory allocation / deallocation.
I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.
Alex
Thank you for a detailed answer, @alexbrainman !
I don't want to accept your change into main repo, because, I believe, that your driver SQLDescribeCol API implementation is incorrect - it returns 256 as column size, and then returns more than 256 of bytes.
Of course that change does not belong in this repository. I just wanted to show you how I worked around the problem. I mean, technically, we could have expanded the API of this library with some bindVarcharColumns
setting or similar, but as you say, there's a bug in the underlying driver. I will make sure to report that.
Thank you again!
Dear @alexbrainman ,
Thank you for an excellent library! I can really see how much hard work that is put in to making ODBC available for golang users. I was wondering if you could help us out a little with reading variable length strings from Spark/Databricks.
We currently run our backend services in golang. Where we previously have been using PostgreSQL and
pq
, we now need to connect to a Spark SQL endpoint (via databricks). They offer drivers as both JDBC and ODBC and the latter is indeed what we need. We have been successful in connecting and querying our Spark SQL endpoints withunixodbc
, the mentioned Simba ODBC driver, and this package.However, I have run into an issue reading columns with the
STRING
type. Here's the code to reproduce:This panics with the following error message:
The
default.transactions
table contains adata
column which has the typestring
. This is whatdescribe default.transactions
return. This is the only valid data type for strings in Spark.I have seen other issues mention this in the past. Particularly #98 is relevant, where @joshuasprow mentions that VARCHAR columns is by default assigned 256 bytes, while for some databases the length can be much greater than that. I believe that is the issue we are running into as well.
I think I have nailed the problem down a little.
When binding columns, a call to [
C.SQLDescribeColW
](https://github.com/alexbrainman/odbc/blob/39f8520b0d5f7ee720424b441e026a1892f96f5e/api/zapi_unix.go#L44 is made. This inputs a statement handle, and a column number, and outputs data about the column, including its length. The length returned for thedata
column is 256. This length is then passed on toNewVariableWidthColumn
which rightfully creates a buffer for that column of size 256.When calling
rows.Next()
, it reads the first results and tries to fit my 668 length character string into the buffer which is only 256 long which makes the program panic.I have tried doing this on both OSX and Linux (with
unixodbc
) and got the same issue. Usingunixodbc
and the same driver withpyodbc
works as expected.Now my question to you whether you have some pointers on how to solve my problem?
Thank you!
Håkon