anthony-tuininga / ceODBC

Python module for accessing databases using the ODBC API.
https://anthony-tuininga.github.io/ceODBC/
12 stars 8 forks source link

UnicodeDecodeError UTF-8 #13

Open mschubert90 opened 1 year ago

mschubert90 commented 1 year ago

We're getting an UnicodeDecodeError when trying to fetch Data from our MSSQL DB. It seems like german "Umlaute" are causing this error in our case the "ü" in "Baden-Württemberg". Is it only possible to parse utf-8 strings with ceODBC? We've tried to handle this error with the "outputtypehandler" method but without success. The error occurs before performing the out-conversion.

import ceODBC

SOURCE_TABLE = {SOURCE_TABLE }
chunksize = 25000

def utf8_decoder(value):
    try:
        return value.decode("utf-8")
    except:
        return value

def utf8_handler(cursor, type, length, unknown):
    if type == ceODBC.DB_TYPE_STRING:
        var = cursor.var(ceODBC.DB_TYPE_STRING, size=length)
        var.outconverter = utf8_decoder
        return var

connection_string = {CONNECTION_STRING}
connection = ceODBC.connect(connection_string, autocommit=True)

cursor = connection.cursor()
cursor.outputtypehandler = utf8_handler

rows = cursor.execute(f"SELECT * FROM `{SOURCE_TABLE}").fetchmany(chunksize)

Traceback (most recent call last): rows = cursor.execute(f"SELECT * FROM {SOURCE_TABLE}").fetchmany(chunksize) File "src\ceODBC\cursor.pyx", line 463, in ceODBC.driver.Cursor.fetchmany File "src\ceODBC\cursor.pyx", line 134, in ceODBC.driver.Cursor._create_row File "src\ceODBC\var.pyx", line 50, in ceODBC.driver.Var._get_value File "src\ceODBC\var.pyx", line 64, in ceODBC.driver.Var._get_value_helper UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 7: invalid start byte

anthony-tuininga commented 6 months ago

I can add a similar capability to python-oracledb: a bypass_decode option which will avoid the decode and simply return the bytes exactly as retrieved by the database. I'll let you know once I have that implemented.