MagicStack / asyncpg

A fast PostgreSQL Database Client Library for Python/asyncio.
Apache License 2.0
6.78k stars 395 forks source link

Support pyformat and keyword args #9

Closed nhumrich closed 4 years ago

nhumrich commented 8 years ago

It looks like for prepared statements, statements have to be in the form 'select $1 from $2 where baz=$3 with the arguments ('foo', 'bar', 'baz')

It is common in python database api's to support the pyformat int PEP 249. http://legacy.python.org/dev/peps/pep-0249/#paramstyle

Can this support at least one of the param styles in the pep? Preferably pyformat.

Ideally, the parameters could then be passed in as a dictionary instead of a ordered list.

conn.fetch(myquery, {bar='bar', foo='foo'})

ludovic-gasc commented 8 years ago

+1 for pyformat: we use massively this format for our queries, it's the most explicit, at least to me.

1st1 commented 8 years ago

I'll let this issue stay open for a while to see if a lot of people is interested in this feature. Adding named parameters will require us to tokenize SQL, which is something I don't want to do unless it's absolutely necessary.

gigatexal commented 7 years ago

Will tokenizing hurt performance? I, too, prefer the dictionary vs ordered list approach.

sethmlarson commented 7 years ago

I can say that if this feature were added I would use it over ordered list.

gglanzani commented 7 years ago

@1st1 Interested here as well. It makes everything so much easier

treycucco commented 7 years ago

@1st1 I'd like to see this implemented as well. It would make maintaining queries easier. With just ordered parameters you have to be very careful when updating queries, and it's easy to get it wrong, especially if you have lots of parameters.

bobcolner commented 7 years ago

named keywords have proven more reliable for me as well. On brand with the python-zen 'explicit > implicit' credo as well.

imbolc commented 7 years ago

Why don't you just use little wrappers like this?

def insert(conn, table, fields=None, **kwargs):                                                                               
    fields = fields or {}                                                                                                     
    fields.update(kwargs)                                                                                                     
    items = list(fields.items())                                                                                              
    keys, values = [x[0] for x in items], [x[1] for x in items]                                                               
    placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]                                                         
    query = 'INSERT INTO {} ({}) VALUES ({}) RETURNING id'.format(                                                            
        table, ', '.join(keys), ', '.join(placeholders))                                                                      
    return conn.fetchval(query, *values)

id = await insert(conn, 'users', {'email': 'foo@gmail.com', 'name': 'foo'})                                               
id = await insert(conn, 'users', email='bar@gmail.com', name='bar') 
Daenyth commented 6 years ago

I'd very much like to see named parameters as well - ordered ones have proven error prone to the point where we've made it a code style across the company to use only named parameters unless there's some reason it's impossible.

thehesiod commented 6 years ago

If tokenizing I'd suggest re-using any available code in libpq/psycopg2

elprans commented 6 years ago

If tokenizing I'd suggest re-using any available code in libpq/psycopg2

psycopg2 is LGPL, we can't reuse any code from there.

thehesiod commented 6 years ago

meant as a library :)

samuelcolvin commented 6 years ago

is there any progress on this?

Both #196 and #152 seem to be cold. For me this is by far the biggest missing feature from asyncpg.

Or would it actually be simpler to leave this out of asyncpg and implement it in a wrapper?

elprans commented 6 years ago

There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle.

196 has been on a backburner for a while, but it's not dead. We need to figure out how to do the query "middleware" interface properly.

juandiegopalomino commented 5 years ago

@1st1 hi there,

This is an awesome library, but this is one of the features which is sorely missing. We've been thinking about making a wrapper which transforms the sql statement based on the kwargs positions, and then calling the vanilla execution, but I feel like we're missing something (the solution seems too simple to have been overlooked). What do you think?

samuelcolvin commented 5 years ago

Hi @juandiegopalomino I've built buildpg for this purpose.

It's well tested and I'm using it in production now, however I haven't yet got round to writing to docs, however, the unit tests should demonstrate what it can do.

Basic usage is

import asyncio
from buildpg import Var, asyncpg, render, select_fields

print(render('SELECT :x', x=4))
#> ('SELECT $1', [4])

async def main():
    conn = await asyncpg.connect_b('postgresql://postgres:waffle@localhost/test')
    # Execute a statement to create a new table.
    await conn.fetch_b('SELECT * FROM foobar WHERE a=:a AND b=:b', a='foo', b='bar')
    # executes 'SELECT * FROM foobar WHERE a=$1 AND b=$2' with params ['foo', 'bar']

    await conn.fetchval_b(
        'SELECT :fields FROM foobar WHERE :where',
        fields=select_fields('first_name', 'last_name'),
        where=(Var('foo') > 2) & ((Var('bar') == 4) | (Var('spam') != Var('whatever') ** 2))
    )
    # executes 'SELECT first_name, last_name FROM foobar WHERE foo > $1 AND (bar = $2 OR spam != whatever ^ $3)'
    # with params [2, 4, 2]

    await conn.close()

asyncio.get_event_loop().run_until_complete(main())

In summary:

If anyone's keen to use this, let me know and I'll write proper docs.

I used colon variable references eg. :foo rather than {foo}) to avoid confusion with sql braces and since that approach seems to be more common, but it's possible to create your own render which uses different regexes.

juandiegopalomino commented 5 years ago

@samuelcolvin hey, thanks for writing that but for now we're writing a wrapper that handles named & indexed parameters and multiple statements via sqlparse and the default string formatter. Thanks for the input though!

bersace commented 5 years ago

Having server side query formatting is good for performance and security. I wish we could always rely on this in asyncpg.

pvanliefland commented 5 years ago

I'd love this feature as well...

bersace commented 5 years ago

Would it be possible to transform a pyformat query to a native Postgres parameterized query ?

samuelcolvin commented 5 years ago

"pyformat query" is a little ambiguous, but I think I understand what you mean.

That's what buildpg does, it takes "sql + keyword arguments" and modifies that into "modified sql + positional arguments", then passes that to the standard asyncpg methods to in turn be passed to postgresql's prepared statements. See this class for details.

I think it's the opinion of @1st1 that this should be done in a separate library, rather than as part of asyncpg, to avoid bloat.

tomchristie commented 5 years ago

There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle.

Similarly: https://github.com/encode/databases

arlobryer commented 4 years ago

@1st1 - I'd also really like this feature.

mikhaildubov commented 4 years ago

@1st1 - same here, would be great to have this feature. Has a decision been made it will not be implemented? From what I can see here it looks like it's been three years with no progress

elprans commented 4 years ago

We have made a decision to not support keyword args directly in asyncpg. The goal of asyncpg is to be a minimal interface around native Postgres features. Query tokenization and rendering fall outside of this scope, and, as several commenters mentioned above, can be implemented in a wrapper.

We are still open to the concept of middleware, as proposed in #152, which would make plugging in support for custom query syntaxes easier to accomplish. If someone is willing to champion that effort, I'd be happy to mentor and review.

bitmodeler commented 4 years ago

Testing it today: https://gitlab.com/osfda/asyncpg_utility

Let me know what you think.

(a standalone utility module, as you had preferred...)

eirnym commented 4 years ago

@bitmodeler Good start. I wish to see tests, documentation improvement and less repetitions in code. and absence of sys.exit, obviously.

bitmodeler commented 4 years ago

Unit test: can add a jacket that furnishes a query of one's choosing, and an expected result on one's choosing.

More Documentation: coming, once I'm pretty sure all the functionality is stable (didn't want to have to be revising it in lockstep, as I thought up better ways to implement it...)

Removed the code redundancies in the raising of errors (did not want to complicate the nesting of the traceback further, but that's alright...)

What would you want in place of sys.exit?? (I deemed it a critical error, so I presumed you would want the the code shut down...)

bitmodeler commented 4 years ago

asyncio.get_event_loop().close() ??

eirnym commented 4 years ago

Dependencies: why does the library depend on asyncpg and asyncio while it do just string formatting and nothing else?

Unit tests are good even on the beginning. They show how should I use and how I shouldn't use your library, and also show where are possible usage obstacles and how to use them easier. This part doesn't replace documentation, though

Documentation: the current documentation is really not enough even to try your code.

Application exit: throwing an exception is more than enough to say something is going wrong, it's not libraries' work to decide when and how application should be shut down.

bitmodeler commented 4 years ago

asyncpg dependency: it prepares the statement, and passes through values to your fetch routines; rather than two calls for each, it's done with just one (no repetition; right?)

As for insufficient documentation (keeping in mind more to come...): it shows a COMPLETE working example! If a person can''t get it from that, I doubt they will ever...

eirnym commented 4 years ago

@bitmodeler,

Libraries' work is just to prepare statement, but I'd prefer to control every step on my own, because layer which prepares the statement and arguments should be separated from each other, just because of security and architecture reasons (e.g. log statements, set connection parameters before an execution, execute many statements on the same conn and more). But this library now works like format function will print out data to /dev/stdout.

I see scope of such library is: 1) re-format statement with names to statement with numbers 2) provide a dictionary name to list of indices and a simple wrapper around (I'd prefer to have an ability of my own implementations for an optimization) 3) no additional functions included (they are could be nice for a demo, but not required or dangerous for a production code)

I see things which are out of scope are done in this library: 1) creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes 2) _exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests) 3) _key_err static method which can be inlined as it just a wrapper above KeyError(message) 4) raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it 5) some reasons to replace ReferenceError with something else 6) calling asyncpg. A programmer can do it himself. 7) absence of ability to just format values without a asyncpg call.

docs: 1) I see, there's just no syntax highlighting. 1) No comments in an example code highlighting features 1) There should be an example with custom types 1) why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?

BTW: I'm happy that you've written this anyway and want to shape it to a really usable code in production with tests and good documentation.

bitmodeler commented 4 years ago

"creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes"

Those are just convenience helpers I added; one can opt to disregard them. I use them often (less wordy!) So you'll note it's a general purpose asyncpg "utility" module; not just the "prepared statement" module.

"_exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)"

Presently it's a pseudo-exit, added to allow a programmer to override it with the handling one chooses. Perhaps I call it "_ab_end" instead?

"_key_err static method which can be inlined as it just a wrapper above KeyError(message)"

A lot of controversy on methods for inlining in python; seeing as how this is not called often, I'll leave it be. Could use the "inline" module, but articles say it messes up cython (which is getting greater adoption now...) Or maybe I go back to introducing the minor repetition?

"raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it"

SOLD! Will replace it. Open to a suggested replacement for ReferenceError. [I would rather shy away from making custom exceptions, because that will just be more idiosyncratic stuff a person has to know to use it...]

"why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?"

It's a big django standard for template variables, and was less likely to be accidentally used for other purposes in one's fabrication of SQL queries.

Adding unit testing today; will make a temporary table as part of the unit test...

samuelcolvin commented 4 years ago

I feel this conversation has gone off topic.

bitmodeler commented 4 years ago

?

We're providing a solution to the problem that the original topic was all about...

samuelcolvin commented 4 years ago

Providing a solution involves a single comment with a link to a library that can be used as a workaround.

Any further discussion about that workaround belongs in my opinion on that libraries issue tracker.

bitmodeler commented 4 years ago

OK; signing off this thread. Anybody needing the named parameter solution can find it at the git link posted above. It's an add-on class that leverages off of the existing asyncpg codebase.

[As I say on the git: I think named parameters are much safer than positional ones, and in my opinion it should be the rule and not the exception...]

bitmodeler commented 4 years ago

PS Thanks for the constructive suggestions, Arseny...

eirnym commented 4 years ago

@samuelcolvin thank you for a notification

@bitmodeler Thank you for a discussion

bitmodeler commented 4 years ago

By the way, Arseny: you were right! The query parsing needed to be decoupled from the statement generation, for reasons of thread efficiency (otherwise, the threads using prepared statements might get bottlenecked on one connection; accomplished that by using two classes...)

[Just wanted to credit Arseny for the good advice in this thread, Sam...]

cordalace commented 4 years ago

pyformat could be simply converted to psql-format using defaultdict:

import collections
import itertools
from typing import Any, Dict, Tuple, List

def pyformat2psql(query: str, named_args: Dict[str, Any]) -> Tuple[str, List[Any]]:
    positional_generator = itertools.count(1)
    positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
    formatted_query = query % positional_map
    positional_items = sorted(
        positional_map.items(),
        key=lambda item: int(item[1].replace('$', '')),
    )
    positional_args = [named_args[named_arg] for named_arg, _ in positional_items]
    return formatted_query, positional_args

Example:

query = "SELECT * FROM user WHERE name = %(name)s, email = %(email)s"
named_args = {'name': 'John Cena', 'email': 'cena@example.org'}
new_query, positional_args = pyformat2psql(query, named_args)
print(new_query)  # prints SELECT * FROM user WHERE name = $1, email = $2
print(positional_args)  # prints ['John Cena', 'cena@example.org']
sany2k8 commented 3 years ago

Is there any sql alchemy core implementation for asyncpg? I don't want to use raw queries

statement = (
                insert(table_obj).
                    values(named_args)
            )

Working code:

 async def insert_named_parameter(self, table, fields=None, **kwargs):
        fields = fields or {}
        fields.update(kwargs)
        items = list(fields.items())
        keys, values = [x[0] for x in items], [x[1] for x in items]
        placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]
        query = 'INSERT INTO {} ({}) VALUES ({})'.format(table, ', '.join(keys), ', '.join(placeholders))
        return await self.connection.execute(query, *values)

    async def insert_keyword(self, keyword_data):

        if keyword_data:
            keyword_data['update_timestamp'] = datetime.datetime.now()
            row = await self.insert_named_parameter('keywords', keyword_data)
            return row
        else:
            return None

But I want sqlalchemy version without raw query and table name.

elprans commented 3 years ago

SQLAlchemy 1.4 supports asyncpg natively: https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#asynchronous-io-support-for-core-and-orm

nhumrich commented 3 years ago

@sany2k8 there is also asyncpgsa : https://github.com/CanopyTax/asyncpgsa

gnat commented 3 years ago

Thank you @cordalace for the polished solution for developers who want optional named parameters! :+1: Should seriously be considered for inclusion as a core utility function, considering how popular this story is!

Also to be fair many of us do not want to introduce ORM bloat. Named parameters are a highly valuable (for many), yet tiny, feature.

That said, thank you for asyncpg. I'm astounded at the quality of this project! @elprans @1st1

singlecheeze commented 3 years ago

@cordalace built upon your code above for a "many" variant, list of dicts... seems to work, but curious how you would implement!? Thank you for your code too!

def pyformat2psql_many(query: str, named_args: List[Dict[str, Any]]) -> Tuple[str, List[Any]]:
    arg_items = []
    positional_generator = itertools.count(1)
    positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
    formatted_query = query % positional_map

    for arg_item in named_args:
        positional_items = sorted(
            positional_map.items(),
            key=lambda item: int(item[1].replace('$', '')),
        )
        positional_args = [arg_item[named_arg] for named_arg, _ in positional_items]
        arg_items.append(positional_args)

    return formatted_query, arg_items
tebanep commented 3 years ago

Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:


# Serial Dict
#############

class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"

# Tests
#######

parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3"  # Ordered Parameters

parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1"  # Unordered Parameters

parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2"  # Repeated Parameters

# Mocking Fetch
###############

def mock_fetch(query, *args):
    assert query == "select $1 from $1 where baz=$2"
    assert args == ('FOO', 'BAZ')

query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})

mock_fetch(query.format_map(parameters), *parameters.values())

NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.

SQL function arguments can only be used as data values, not as identifiers.

https://www.postgresql.org/docs/current/xfunc-sql.html

Hope this might be helpful for some of you! :D

gnat commented 3 years ago

Wanted to handle auto-removal of arguments not present in the query, instead of throwing an error. Based on @tebanep's snippet.

Enjoy:

# Serial Dictionary.
class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"

# Pyformat to psql format.
def pyformat2psql(query: str, args_dict: Dict[str, Any]) -> Tuple[str, List[Any]]:
    # Remove args not present in query.
    args_list = list(args_dict.keys())
    for value in args_list:
        if f"{{{value}}}" not in query:
            args_dict.pop(value, None)
    # Generate query with serial positions.
    args = Serial(args_dict)
    query_formatted = query.format_map(args)
    args_formatted = list(args.values())
    return query_formatted, args_formatted

Usage

# Will just ignore 'foo' instead of throwing error.
query, params = pyformat2psql("SELECT * FROM users where name={baz}", {'foo': 'bob2', 'baz': 'bob'})
result = await conn.fetch(query, *params)
singlecheeze commented 2 years ago

Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:

# Serial Dict
#############

class Serial(dict):
    def __getitem__(self, key):
        return f"${list(self.keys()).index(key) + 1}"

# Tests
#######

parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3"  # Ordered Parameters

parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1"  # Unordered Parameters

parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2"  # Repeated Parameters

# Mocking Fetch
###############

def mock_fetch(query, *args):
    assert query == "select $1 from $1 where baz=$2"
    assert args == ('FOO', 'BAZ')

query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})

mock_fetch(query.format_map(parameters), *parameters.values())

NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.

SQL function arguments can only be used as data values, not as identifiers.

https://www.postgresql.org/docs/current/xfunc-sql.html

Hope this might be helpful for some of you! :D

This is almost perfect, but what about an .executemany() variant? :smile: