Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
109 stars 43 forks source link

Infinity in row being fetched causes exception #3

Closed ShaunCurrier closed 8 years ago

ShaunCurrier commented 8 years ago

By executing: cursor.execute('SELECT * from data_set_dim_values where FKeyDataSetDimDef=12 and "Index"=36')

...to get the following row...

FKeyDataSetDimDef | Index | Elem2Vlaue
12 | 36 | Infinity

(where "Infinity" is data type float) ....I was able to produce the following error consistently. Perhaps handling of infinity needs improvement?

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\PythonCAT\x64\2.7\lib\site-packages\teradata\udaexec.py", line 628, in fetchall
    return self.cursor.fetchall()
  File "C:\PythonCAT\x64\2.7\lib\site-packages\teradata\util.py", line 110, in fetchall
    for row in self:
  File "C:\PythonCAT\x64\2.7\lib\site-packages\teradata\util.py", line 142, in next
    return self.__next__()
  File "C:\PythonCAT\x64\2.7\lib\site-packages\teradata\util.py", line 135, in __next__
    values[i] = self.converter.convertValue(self.dbType, self.types[i][0], self.types[i][1], values[i])
  File "C:\PythonCAT\x64\2.7\lib\site-packages\teradata\datatypes.py", line 182, in convertValue
    return NUMBER(value)
  File "C:\PythonCAT\x64\2.7\lib\decimal.py", line 548, in __new__
    "Invalid literal for Decimal: %r" % value)
  File "C:\PythonCAT\x64\2.7\lib\decimal.py", line 3872, in _raise_error
    raise error(explanation)
InvalidOperation: Invalid literal for Decimal: u'1.#INF'
ShaunCurrier commented 8 years ago

Whoops, I forgot to show that I was doing .fetchone() on the command. cursor.execute('SELECT * from data_set_dim_values where FKeyDataSetDimDef=12 and "Index"=36').fetchone()

ShaunCurrier commented 8 years ago

I am using Python 2.7 x64 on Windows 7 x64. :)

escheie commented 8 years ago

Thanks for reporting. I'm trying to write a unit test that inserts a value of Infinity for a float column for testing purposes, but I can't figure out how to do it. Would you happen to know?

ShaunCurrier commented 8 years ago

I think that the way I wrote inserted this infinity originally was by simply writing np.inf using pyodbc. ...or it might have been float("inf") in base python, can't remember. Thanks for your attention to this bug! Eagerly awaiting the fix! Thanks for a great module I was looking for something like this a month ago and other modules have some drawbacks to them so it's good to get an official module. ceODBC did funny things with timestamps and pyodbc had god-awful insert performance, apparently it didn't prepare the query properly for executemany or something.

ShaunCurrier commented 8 years ago

Close button is a little too close to the comment button...

ShaunCurrier commented 8 years ago

Is this infinity issue accepted as a bug? It's not labelled and milestoned like the run number issue is. I'm new to this type of collaboration so I'm not sure what to expect.

escheie commented 8 years ago

I haven't been able to reproduce since I can't figure out how to get an NaN or Infinity column into a test table so that I can test the fix. Does Infinity hold significance for your application? One thing I could change is that for any float value that isn't valid, just map it to decimal.Decmial('NaN'). Its not clear if the database officially supports these special numbers, for example how would you query for values = Infinity?

ShaunCurrier commented 8 years ago

I not an expert on data types, but I have some evidence that infinity should be supported.

This page says that Teradata's float is stored and manipulated as the IEEE floating point format, which I believe refers to IEEE 754. The IEEE 754 standard supports infinity. http://www.info.teradata.com/htmlpubs/DB_TTU_15_00/index.html#page/SQL_Reference/B035_1143_015K/Numeric_Types.031.45.html#ww15511031

In a similar vain, the link on this page about "potential problems associated with floating point values in comparisons and computations" mentions nothing about infinity issues. http://www.info.teradata.com/htmlpubs/DB_TTU_15_00/index.html#page/SQL_Reference/B035_1143_015K/Numeric_Types.031.48.html#ww681780

Here is a result from Teradata Studio where infinity is being handled fine as the result of a select query, even printing "Infinity", which I imagine requires explicit interpretation: image

Infinity is very significant for my application, which is a scientific one.

May I ask why Teradata FLOATs are being cast to Python decimal data types? Why not a Python float? The problem seems to occur outside of Teradata, while trying to convert Teradata's result value to decimal. Anecdotally, to my knowledge, pyodbc and ceODBC do not cast to decimal in this scenario. Maybe there are good reasons to differ, but I don't know them. Isn't the conversion from Teradata FLOAT to python decimal by definition introducing rounding errors, even with non-special IEEE 754 numbers?

I could change Teradata data types for this column if there is a data type that is more appropriate for representing infinity, but I don't think there is one.

I hope I'm not sounding too negative. I really like the module.

escheie commented 8 years ago

Thanks for the additional details. The reason why I questioned support for these values was because there does not appear to be a SQL literal representation for them. It appears they can only be inserted and used in where clauses when passed as parameters. But I understand their importance now and will investigate the fix.

Decimal was chosen because it has many advantages over float type that are enumerated in the API docs. Even the Python 2.0 DB API specification includes an open issue about using Decimal instead of float. Decimal was chosen as a matter of convenience for these reasons. It is a lossless number representation so its does not introduce any rounding errors.

Thanks, -Eric

ShaunCurrier commented 8 years ago

Thanks for investigating this.

Is if your goal is to allow for dealing with floats going in both directions? I need to do both (1) insert -inf and +inf floats to the teradata module (via passing them as parameters to columns of data type FLOAT) and (2) retrieve them.

For example, would the following code work?

with pyodbc.connect(conn_str) as conn:
    curs = conn.cursor()
    curs.execute('CREATE TABLE testfloats (myfloat FLOAT)')
    conn.commit()
    curs.execute('INSERT INTO testfloats VALUES (?)', float('inf'))
    my_float_value = curs.execute('SELECT * FROM testfloats').fetchone()[0]
    print type(my_float_value)
    print my_float_value
    curs.execute('DROP TABLE testfloats')

prints:

<type 'float'>
inf

This works with both pyodbc and ceODBC.

escheie commented 8 years ago

Yes, I will make sure it works for both inserts and selects.

ShaunCurrier commented 8 years ago

Great! Thanks.