cloudera / impyla

Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Apache License 2.0
731 stars 248 forks source link

impula return extra digits from kudu #506

Closed hyh1618 closed 1 year ago

hyh1618 commented 1 year ago

I have a float type column in a kudu table. With a value of 1.512345, the python program connecting through impyla will return 1.51234467023 from select. How do I keep the float/double value same as input? Thanks.

csringhofer commented 1 year ago

Floats that look like simple fractional numbers in decimal format are often not representable without loss of precision because the base of the exponent is 2, not 10: https://en.wikipedia.org/wiki/Single-precision_floating-point_format

In Impala shell: +-------------------------+ | cast(1.512345 as float) | +-------------------------+ | 1.5123449564 | +-------------------------+

So there isn't really a number like 1.512345 as float - only an approximation of it that has infinite digits as decimal and when the float is formatted to string it has to be decided how many digits to keep.

Impyla returns floats in binary format, so the print format is decided by the user. My guess is that the client used to get the Kudu value simply uses less digits during formatting.

If you want represent decimals without loss you have to use the DECIMAL data type instead of float/double. Decimals can be also used the control the number of digits returned. +-----------------------------------------------+ | cast(cast(1.512345 as float) as decimal(7,6)) | +-----------------------------------------------+ | 1.512345 | +-----------------------------------------------+

csringhofer commented 1 year ago

Closing this as I don't think that it is an Impyla issue.