anthony-tuininga / ceODBC

Python module for accessing databases using the ODBC API.
https://anthony-tuininga.github.io/ceODBC/
12 stars 8 forks source link

[Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation #10

Closed adfontesmedia closed 3 years ago

adfontesmedia commented 3 years ago

This might be an encoding issues, but I didn't have this problem with pyodbc, and I was hoping that it was something that you knew how to deal with.

The data stream is a little tortured. The original data sits on an Oracle database, and is exported to CSV files, which are then pulled down from an FTP server. The CSV files have an accompanying JSON that describes the table it's exported from.

On this side, we use the JSON file to create an equivalent table on an SQL Server database, then use ecODBC to read the data into it. We were using pyODBC, but ecODBC has a huge performance boost.

The problem we run into is sporadic. After writing a few hundred rows, one of the rows will refuse to load, insisting that it's running into a right truncation. The row it chokes on is consistent within a file, but the content of that file doesn't actually have any rows that should cause overflow any more than the previous rows.

For instance, in one record the only field that comes within 50% of the row length is a GUID in the form [A-F0-9-]{36,36} . I don't know why this would be ok for 150 rows and then suddenly choke. Even more perplexing, I've had files that choke when I pass multiple rows through executemany, but then work ok when I pass a single row through executemany. Yes, a single row through the many version.

We're running Python 3.7, and our database is set to use UTF-8 encoding. Any advice on further debugging and/or providing you with more information would be helpful.

Thanks,

Robert Rapplean

adfontesmedia commented 3 years ago

I made a work-around by iterating through the row and printing the column number and value that is causing the problem. I think I'm going to have to default my target table to using NVARCHAR and equivalent types.

def find_bad_column(self, sql_insert_statement, failed_row):
    print(f"Trying to figure out which column is bad for {failed_row}")
    for itr in range(0, len(failed_row)):
        holdval = failed_row[itr]
        failed_row[itr] = ''
        try:
            self.dbmanager.execute(sql_insert_statement, self.format(failed_row))
            self.conf['logger'].warning(f"Successful insert. Bad value is column {itr}, value {holdval}")
                return True
        except ceODBC.Error:
            failed_row[itr] = holdval
        return False
anthony-tuininga commented 3 years ago

I'm glad you found a workaround. If you are able to provide a test case that demonstrates the problem I'd also be happy to solve it properly!

adfontesmedia commented 3 years ago

A possible clue, when I enter the equivalent query through PyCharm's interface, it tells me "Unicode strings should have N prefix", but doesn't cause the insert problem. I'll keep trying to find a good example case.

Marwan-Amin commented 2 years ago

Is there is any update or fix for this issue ??

anthony-tuininga commented 2 years ago

Do you have an example that fails? If so, I'd be happy to look into it. Right now it isn't obvious to me what the issue is, so nothing has been done.

antonlahti commented 1 year ago

I've had the same issue.

EDIT: I'm running:

EDIT: Added ODBC trace: SQL.LOG

Given this table declaration:

CREATE TABLE strings (
    id int NOT NULL,
    [data] nvarchar(400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
    CONSTRAINT PK__strings__3213E83FDE156BB0 PRIMARY KEY (id),
    CONSTRAINT UQ__strings__D9DE21E1EE0369BF UNIQUE ([data])
);

And the following Python code::

import ceODBC

dsn = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s" % (
    server,
    database,
    uid,
    pwd,
)
db_connection = ceODBC.connect(dsn)
cursor = db_connection.cursor()
cursor.execute("INSERT INTO strings VALUES (1, ?)", "²")

I get the following error:

Traceback (most recent call last): File "mini.py", line 13, in cursor.execute("INSERT INTO strings VALUES (1, ?)", "²") File "src\ceODBC\cursor.pyx", line 423, in ceODBC.driver.Cursor.execute File "src\ceODBC\cursor.pyx", line 268, in ceODBC.driver.Cursor._execute File "src\ceODBC\errors.pyx", line 31, in ceODBC.driver._check_stmt_error File "src\ceODBC\errors.pyx", line 23, in ceODBC.driver._check_error File "src\ceODBC\errors.pyx", line 67, in ceODBC.driver._raise_from_odbc File "src\ceODBC\errors.pyx", line 83, in ceODBC.driver._raise_from_string ceODBC.exceptions.DatabaseError: [Microsoft][ODBC Driver 13 for SQL Server]String data, right truncation