encode / databases

Async database support for Python. 🗄
https://www.encode.io/databases/
BSD 3-Clause "New" or "Revised" License
3.84k stars 261 forks source link

Simple query that works with sqlalchemy fails with databases #128

Open trvrmcs opened 5 years ago

trvrmcs commented 5 years ago

A simple COUNT(*) works using sqlalchemy.Engine.execute(), but fails using databases.Database.fetch_all(), raising an IndeterminateDatatypeError exception.

import asyncio
import databases
import sqlalchemy
from sqlalchemy import Column, MetaData, Text, Table, select, func

url = "postgresql://test:password@localhost/example"
engine = sqlalchemy.create_engine(url)

metadata = MetaData()
People = Table("people", metadata, Column("name", Text()))
QUERY = select([func.count("*")]).select_from(People)

def setup():
    engine.execute("CREATE TABLE IF NOT EXISTS people(name text)")

def test_sync():
    "this works"
    engine.execute(QUERY).fetchall()

async def test_async():
    "this fails with 'asyncpg.exceptions.IndeterminateDatatypeError: could not determine data type of parameter $1'"
    database = databases.Database(url)
    await database.connect()
    await database.fetch_all(QUERY)
    await database.disconnect()

if __name__ == "__main__":
    setup()
    test_sync()

    asyncio.get_event_loop().run_until_complete(test_async())
tomchristie commented 5 years ago

Steps to progress this issue would include:

The exception itself is from the asyncpg library, so it's so either an issue there, or it's to do with the query compilation in databases.

gvbgduh commented 5 years ago

It looks like it's due to a subtle difference in the underlying driver. await database.fetch_all(QUERY) will render the query and perform the command as

raw_connection.fetch('select count($1) from notes;', '*')

that will raise an exception at the asyncpg level

asyncpg.exceptions.IndeterminateDatatypeError: could not determine data type of parameter $1

and to resolve the confusion asyncpg expects it as

raw_connection.fetch('select count($1::text) from notes;', '*')

Another way to resolve it would be to define the query as select([func.count()]).select_from(notes) that'll be rendered as

'SELECT count(*) AS count_1 \nFROM notes'

that doesn't confuse asyncpg.

For engine.execute(QUERY).fetchall() the underlying driver is psycopg2, that receives the query as

engine.execute('select count(%s) from notes;', '*').fetchall()

that it can execute.

tomchristie commented 5 years ago

Is this resolvable for us? Is there anything we can set in the dialect that'd ensure the query gets compiled to 'select count($1::text) from notes;'?

gvbgduh commented 5 years ago

@tomchristie I'm not sure, it's getting tricky as sqlalchemy "supports" psycopg2 (it'll cover aiopg well tho) as the postgres driver and has no support for asyncpg. Those two have subtle differences in general and in the query syntax details particularly and the sqlalchemy's query render isn't aware of it.

In this particular case 'select count(%s) from notes;', '*' would be the best as it suits both drivers.

It's a quite confusing point to me as I'm afraid it might get pretty complicated to control the query rendering might require to go quite deep in adjusting the sqlalchemy logic, where it might have more sense to do at the ORM level. However, on the other hand, we have integration with sqlalchemy core already.

tomchristie commented 5 years ago

May be worth looking into if asyncpgsa has had any similar issues raised, and if they've resolved them or not.

waydegg commented 3 years ago

Any updates on this issue?

waydegg commented 3 years ago

Another solution: compile the query as a string with your database's dialect and use the "literal_binds" kwarg when compiling. This substitutes the parameters into raw SQL which can be dangerous if the parameters are from an untrusted input.

Here's an example for PostgreSQL:

from sqlalchemy.dialects import postgresql

# other imports, database connection, etc. ...

C_QUERY = str(QUERY.compile(compile_kwargs={"literal_binds": True}, dialect=postgresql.dialect()))
res = await database.fetch_all(C_QUERY)

Annoying since you'd have to do this for every query, but I'm using a custom subclass of databases.Database that handles this automatically.