JuliaDatabases / LibPQ.jl

A Julia wrapper for libpq
MIT License
217 stars 51 forks source link

All rational numbers are parsed as Decimals.Decimal instead of Float #254

Closed svilupp closed 2 years ago

svilupp commented 2 years ago

I have noticed unexpected behaviour. It might be intended but I haven't found any reference in the documentation.

Expected behaviour: rational numbers like 1.0 and 0.1 will be represented as Float

Actual behaviour: all non-integer numbers are represented as Decimals.Decimal

MWE:

using LibPQ
# mocking up a download with rational numbers
sql="""
select avg(1.0) as average, 1.0 as float_number, 0.0000000001 as really_small_number
from bap_aed.aed_metrics
"""
result = execute(conn,sql);

# you can see that LibPQ creates Decimals.Decimal types
result.column_types
# 3-element Vector{Type}:
#  Decimals.Decimal
#  Decimals.Decimal
#  Decimals.Decimal

Package version (but I've seen the same behaviour on 0.9) [194296ae] LibPQ v1.14.0

versioninfo() (but it was the same on 1.7) Julia Version 1.8.0 Commit 5544a0fab76 (2022-08-17 13:38 UTC) Platform Info: OS: macOS (arm64-apple-darwin21.3.0) CPU: 8 × Apple M1 Pro WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1) Threads: 6 on 6 virtual cores

iamed2 commented 2 years ago

Only columns with type numeric/decimal will appear this way; float4/real and float8/double precision will not.

The numeric type is an arbitrary-precision number that cannot be represented fully in a Float64. However, Decimals.Decimal is not a great type, and a replacement (or improvements to that package) would be welcome.

You can either force the type in Julia:

julia> execute(conn, sql; type_map=Dict(:numeric => Float64)).column_types
3-element Vector{Type}:
 Float64
 Float64
 Float64

or in SQL:

julia> sql="""
       select avg(1.0)::float8 as average, 1.0::float8 as float_number, 0.0000000001::float8 as really_small_number;
       """
"select avg(1.0)::float8 as average, 1.0::float8 as float_number, 0.0000000001::float8 as really_small_number;\n"

julia> execute(conn, sql).column_types
3-element Vector{Type}:
 Float64
 Float64
 Float64
iamed2 commented 2 years ago

psycopg2 in Python does the same thing, using the decimal module in Python's standard library.

https://www.psycopg.org/docs/faq.html#faq-float

svilupp commented 2 years ago

Thank you for the explanation, @iamed2!

I've been eyeing type_map in the docs to force the conversion, so the example is much appreciated.

I'll close the issue as LibPQ works as expected.


Some additional background: I've used a standard pattern of download SQL -> DataFrame -> save to .arrow for persistence/version control. Upon opening, my saved data was rubbish. It turns out that Arrow.write(...;compress=:lz4) didn't know what to do with Decimals type and saved it however without any warning (an issue opened here).

As for the python side, you're right. I've just checked that both psycopg2 and sqlalchemy (using the former as the driver) return decimals. The challenge is that I (and perhaps others) use Pandas to download data (pd.read_sql_query), which forces the Float type, so I will have never seen the decimal type come out of a SQL / I didn't know it's the expected behaviour.

Good learning!