Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

Unexpected Unicode Conversion Error #73

Open jamesmf opened 7 years ago

jamesmf commented 7 years ago

I've been plagued by hard-to-predict Unicode Conversion Errors while trying to insert non-Unicode data into existing TD tables.

The data being inserted is 2 integer columns and 20 floats, stored in a list of lists, passed as a parameter to cursor.executemany(insertString,data,batch=True)

The session explicitly specifies charSet = 'ASCII' (though that is also the ODBC default).

The table being inserted into also has a TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) column that isn't being inserted from python.

The insertString looks like: INSERT INTO db_name.table_name(comma_dlm_list_of_columns_to_insert) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

I get back: teradata.api.DatabaseError: (1, '[HY000] [Teradata][ODBC Teradata Driver][Teradata Database] Unicode Conversion Error')

escheie commented 7 years ago

Is the problem intermittent? When it fails can you print out the input to the function and isolate the row its failing for?

jamesmf commented 7 years ago

The problem is intermittent and I have a hard time reproducing it. So far I've noticed it only when the input I'm trying to insert contains floats. It also seems like it's happening only when I'm inserting into a table that I didn't create using Python.

I can see that my ODBC drivers have been downgraded to TD 14.10 though the db is on 15.10, so that seems like a possible culprit. Is that a known issue?

jamesmf commented 7 years ago

Okay confirmed that the TD ODBC drivers are still 15.10. It seems to reliably work when I create a table and immediately insert into it, but when I insert into existing tables, I sometimes get this error. Again it seems to only happen when I'm inserting floats (though it's possible that's a coincidence because I can't reliably recreate the error on new test tables).

jamesmf commented 7 years ago

It appears to be an issue relating to the size of the input to .executemany(). When I insert line-by-line it succeeds, as it does when I insert 5000 rows at at time. When I insert ~7500 (a number I get at using a very rough heuristic), I get the error on the first insert. This surprises me because I have certainly had the 'too many rows at once' error in multiple forms: 1) in the case where it's above the explicit maximum (16383 rows I believe) 2) in the SQL request exceeds 1MB case

Any advice or thoughts on this? I'm hesitant to catch that exception, as I assume it's the same exception thrown in the case of an actual Unicode Conversion Error...

escheie commented 7 years ago

That's great you can reproduce it at will now. If you have a simple script that produces the problem I can see if I can reproduce the problem in my environment and generate an ODBC trace file to send to the ODBC support team to have a look.

jamesmf commented 7 years ago

I'm having trouble recreating it with arbitrarily generated data. With data on which it's already failed, I can reliably recreate the issue (including on new tables with the same definition). Also perhaps worth noting, there's an 'insert_dt' field that is defined as TIMESTAMP(6) FORMAT 'YYYY-MM-DDBHH:MI:SSBT' NOT NULL DEFAULT CURRENT_TIMESTAMP(6))

The behavior is baffling:

The data is a list of lists looking like the following:

When I recreate new, random data meeting those specifications, I can't get the same error. I've inspected everything I can about those first 6000 rows and I can't identify anything that occurs in them that doesn't elsewhere. They're all a mixture of int/float/None, there are no values outside the range 0.05-2000., and I can insert every row individually.

escheie commented 7 years ago

Can you try reproducing the issue with ODBC trace enabled and e-mail the file to me?

jamesmf commented 7 years ago

emailed.

andreapiso commented 6 years ago

Any progress in solving this issues? We are encountering the same problem feeding many records to executemany

jamesmf commented 6 years ago

I ended up lumping it in with this error https://github.com/Teradata/PyTd/issues/76 and catching either of these Exceptions. My best solution was to wrap my inserts with a try: and in the except: block it retries with fewer rows at a time.

The only hard and fast rules I could figure out were 1) never insert > 16,383 (I think) rows at once 2) never insert > 1 MB of bound data per insert.