MagicStack / asyncpg

A fast PostgreSQL Database Client Library for Python/asyncio.
Apache License 2.0
6.99k stars 404 forks source link

bug: Incorrect serialization of numeric values #1113

Open Ragueel opened 10 months ago

Ragueel commented 10 months ago

This happens when query selects a numeric column. Numbers like 10000, 100000, etc. get serialized in scientific notation. For example 10000 becomes 1E+4.

To reproduce the error, you can do the following:

Inside tests/test_codecs/test_numeric if 10000 is added in cases then the test fails, due to incorrect serialization logic.

    async def test_numeric(self):
        # Test that we handle dscale correctly.
        cases = [
            '0.001',
            '0.001000',
            '1',
            '1.00000',
            '10000', # new test case
        ]

fail error:

line 614, in test_numeric
    self.assertEqual(str(res), case)
AssertionError: '1E+4' != '10000'
elprans commented 10 months ago

Technically there is no asyncpg bug here, because 1E+4 == 10000, it's just a different representation of the same decimal.Decimal value. If you need a different string representation, use explicit formatting, e.g. print(f'{value:f}')

Ragueel commented 10 months ago

Technically there is no asyncpg bug here, because 1E+4 == 10000, it's just a different representation of the same decimal.Decimal value. If you need a different string representation, use explicit formatting, e.g. print(f'{value:f}')

I don't think using scientific notation by default is a correct behaviour. Consider the following code:

a = Decimal('100000')
b = Decimal('1E+5')

str(a) == str(b) # will be false

This might cause some strange errors, when this value would be used. I encoutered this error when I was dumping data to json, and was surpised on why it was using scientific notation for numbers in the power of 10.

It is also not documented anywhere that this behaviour would be used for numeric values, and users should handle it by themselves.

20-buck-spin commented 2 months ago

I am also facing this issue. I do understand that it is a different representation of the same value. However, I believe most users would prefer the numeric representation (to display prices for example). In my case, the scientific notation kicks in only in certain cases (when a sqlmodel field is defined in a specific way), when the value exceeds 10k and only in my alpine docker image (works as expected in my local linux mint). I will post again when I figure out in what exact cases behaviour arises.

20-buck-spin commented 2 months ago

After spending too much time on mix matching configs I found that the scientific notification is used when serializing multiples of 10,000 if the precision and scale are not configured for the NUMERIC column in Postgres.

After running ALTER TABLE table_name ALTER COLUMN col_name TYPE NUMERIC(10,2); the values are properly serialized. I'm not sure if this is documented anywhere. If anyone runs into this issue, hope this comment helps

EdgyEdgemond commented 2 months ago

We are also facing this issue, its extremely odd behavior to have only a specific subset of numbers in a specific scenario (defined above) behave differently. Having to format all decimals to get the desired format, just in case this bug bites doesn't feel like a legitimate solution.

EdgyEdgemond commented 2 months ago

await connection.set_type_codec("numeric", encoder=str, decoder=Decimal, schema="pg_catalog")

@20-buck-spin We found this helped out, if you wanna give it a shot.

20-buck-spin commented 2 months ago

@EdgyEdgemond, as far as I know, the default decoder is Decimal in asyncpg. I'm not sure if setting it again would help. In my case, making sure the coulmn definition has precision and scale (for Numeric , haven't tested other datatypes) solved the issue.

EdgyEdgemond commented 2 months ago

You'd expect that but setting this stopped returning scientific notation decimals without mutating the database.