Teradata / PyTd

A Python Module to make it easy to script powerful interactions with Teradata Database in a DevOps friendly way.
MIT License
108 stars 43 forks source link

Python decimal type being returned when fetching from Teradata float type columns #13

Closed ShaunCurrier closed 9 years ago

ShaunCurrier commented 9 years ago

I'm using the fix now for this issue: https://github.com/Teradata/PyTd/issues/3

I've found that, as intended, the fix does indeed solve the problem of retrieving the IEEE 754 float infinity values from float columns. The problem I am now facing is that python float values that I insert to Teradata float columns do not compare as equal to those same values when retrieved. This is due to the python decimal object being returned for Teradata float columns instead of the actual float value that Teradata is storing. It is not due to well-known numerical inaccuracy at the far right digits of the value. It seems that the exact float value that was stored is not accessible because it is being cast to decimal upon retrieval. So I'm trying to understand what you were telling me in the thread from the above issue after I brought up the question of why a conversion was happening:

Decimal was chosen because it has many advantages over float type that are enumerated in the API docs. Even the Python 2.0 DB API specification includes an open issue about using Decimal instead of float. Decimal was chosen as a matter of convenience for these reasons. It is a lossless number representation so its does not introduce any rounding errors.

By the open issue, are you referring to this? https://www.python.org/dev/peps/pep-0249/#open-issues If so, I do not interpret that as suggesting that values stored as IEEE 754 floats in a database should be cast to decimals upon retrieval. Is there another issue list that you're looking at that more thoroughly explains this?

I understand that, fundamentally, decimal types provide advantages over float types for certain uses. But float types serve their purpose and provide advantages over decimal types in other uses, no?

Can you help me understand why are they being mixed here?

Should not the python float type be used exclusively with the Teradata float type (since the two float types are exactly the same IEEE 754 floats on virtually all machines) and the python decimal type be used exclusively with the Teradata decimal type (since the two decimal types have the exact same intent)?

I hope opening a new issue was the right way to address this. I thought it was because it really is a separate issue from the above linked issue. Please let me know if not.

ShaunCurrier commented 9 years ago

Note that when I say "float" above I mean "binary float."

escheie commented 9 years ago

Do you think the Decimal object is specifically the issue here? If you take your float and wrap it in a Decimal object, does f == decimal.Decimal(f)?

I actually think the problem is in how the Python module is getting the float from the ODBC driver. Its getting it in string format instead of a binary format before converting it to a Decimal (this was done to keep the data converter consistent between REST and ODBC but I realize now that approach is flawed). I will work on a fix to get it in binary format instead and add an option to return floats instead Decimals if that is your preference.

Thanks, -Eric

escheie commented 9 years ago

I confirmed that f == decimal.Decimal(f) == float(decimal.Decimal(f)) when f is a float, so use of Decimal by default should not pose a problem. I've corrected the precision problem of floats losing digits of precision. If you would prefer to work with floats directly, then you can pass in a dataTypeConverter argument to the udaExec.connect method like so:

import teradata
from teradata import datatypes

udaExec.connect("${dataSource}", dataTypeConverter=datatypes.DefaultDataTypeConverter(
                    useFloat=True))