Closed nhumrich closed 4 years ago
+1 for pyformat: we use massively this format for our queries, it's the most explicit, at least to me.
I'll let this issue stay open for a while to see if a lot of people is interested in this feature. Adding named parameters will require us to tokenize SQL, which is something I don't want to do unless it's absolutely necessary.
Will tokenizing hurt performance? I, too, prefer the dictionary vs ordered list approach.
I can say that if this feature were added I would use it over ordered list.
@1st1 Interested here as well. It makes everything so much easier
@1st1 I'd like to see this implemented as well. It would make maintaining queries easier. With just ordered parameters you have to be very careful when updating queries, and it's easy to get it wrong, especially if you have lots of parameters.
named keywords have proven more reliable for me as well. On brand with the python-zen 'explicit > implicit' credo as well.
Why don't you just use little wrappers like this?
def insert(conn, table, fields=None, **kwargs):
fields = fields or {}
fields.update(kwargs)
items = list(fields.items())
keys, values = [x[0] for x in items], [x[1] for x in items]
placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]
query = 'INSERT INTO {} ({}) VALUES ({}) RETURNING id'.format(
table, ', '.join(keys), ', '.join(placeholders))
return conn.fetchval(query, *values)
id = await insert(conn, 'users', {'email': 'foo@gmail.com', 'name': 'foo'})
id = await insert(conn, 'users', email='bar@gmail.com', name='bar')
I'd very much like to see named parameters as well - ordered ones have proven error prone to the point where we've made it a code style across the company to use only named parameters unless there's some reason it's impossible.
If tokenizing I'd suggest re-using any available code in libpq/psycopg2
If tokenizing I'd suggest re-using any available code in libpq/psycopg2
psycopg2 is LGPL, we can't reuse any code from there.
meant as a library :)
is there any progress on this?
Both #196 and #152 seem to be cold. For me this is by far the biggest missing feature from asyncpg.
Or would it actually be simpler to leave this out of asyncpg and implement it in a wrapper?
There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle.
@1st1 hi there,
This is an awesome library, but this is one of the features which is sorely missing. We've been thinking about making a wrapper which transforms the sql statement based on the kwargs positions, and then calling the vanilla execution, but I feel like we're missing something (the solution seems too simple to have been overlooked). What do you think?
Hi @juandiegopalomino I've built buildpg for this purpose.
It's well tested and I'm using it in production now, however I haven't yet got round to writing to docs, however, the unit tests should demonstrate what it can do.
Basic usage is
import asyncio
from buildpg import Var, asyncpg, render, select_fields
print(render('SELECT :x', x=4))
#> ('SELECT $1', [4])
async def main():
conn = await asyncpg.connect_b('postgresql://postgres:waffle@localhost/test')
# Execute a statement to create a new table.
await conn.fetch_b('SELECT * FROM foobar WHERE a=:a AND b=:b', a='foo', b='bar')
# executes 'SELECT * FROM foobar WHERE a=$1 AND b=$2' with params ['foo', 'bar']
await conn.fetchval_b(
'SELECT :fields FROM foobar WHERE :where',
fields=select_fields('first_name', 'last_name'),
where=(Var('foo') > 2) & ((Var('bar') == 4) | (Var('spam') != Var('whatever') ** 2))
)
# executes 'SELECT first_name, last_name FROM foobar WHERE foo > $1 AND (bar = $2 OR spam != whatever ^ $3)'
# with params [2, 4, 2]
await conn.close()
asyncio.get_event_loop().run_until_complete(main())
In summary:
render
method can be used to prepare sql and positional parametersfrom buildpg import asyncpg
gets you a thin wrapper around asyncpg, this provides connect_b
and create_pool_b
(the b suffix denotes the query "builder" variant), these just use a connection class BuildPgConnection
.BuildPgConnection
just adds execute_b
, fetch_b
, fetchval_b
etc. which take keyword arguments instead of positional argumentsIf anyone's keen to use this, let me know and I'll write proper docs.
I used colon variable references eg. :foo
rather than {foo}
) to avoid confusion with sql braces and since that approach seems to be more common, but it's possible to create your own render which uses different regexes.
@samuelcolvin hey, thanks for writing that but for now we're writing a wrapper that handles named & indexed parameters and multiple statements via sqlparse and the default string formatter. Thanks for the input though!
Having server side query formatting is good for performance and security. I wish we could always rely on this in asyncpg.
I'd love this feature as well...
Would it be possible to transform a pyformat query to a native Postgres parameterized query ?
"pyformat query" is a little ambiguous, but I think I understand what you mean.
That's what buildpg does, it takes "sql + keyword arguments" and modifies that into "modified sql + positional arguments", then passes that to the standard asyncpg methods to in turn be passed to postgresql's prepared statements. See this class for details.
I think it's the opinion of @1st1 that this should be done in a separate library, rather than as part of asyncpg, to avoid bloat.
There is a number of wrappers for asyncpg and SQLAlchemy: asyncpgsa, metapensiero.sqlalchemy.asyncpg, GINO, etc. All of those use SQLAlchemy to generate queries with the correct paramstyle.
Similarly: https://github.com/encode/databases
@1st1 - I'd also really like this feature.
@1st1 - same here, would be great to have this feature. Has a decision been made it will not be implemented? From what I can see here it looks like it's been three years with no progress
We have made a decision to not support keyword args directly in asyncpg. The goal of asyncpg is to be a minimal interface around native Postgres features. Query tokenization and rendering fall outside of this scope, and, as several commenters mentioned above, can be implemented in a wrapper.
We are still open to the concept of middleware, as proposed in #152, which would make plugging in support for custom query syntaxes easier to accomplish. If someone is willing to champion that effort, I'd be happy to mentor and review.
Testing it today: https://gitlab.com/osfda/asyncpg_utility
Let me know what you think.
(a standalone utility module, as you had preferred...)
@bitmodeler Good start. I wish to see tests, documentation improvement and less repetitions in code. and absence of sys.exit
, obviously.
Unit test: can add a jacket that furnishes a query of one's choosing, and an expected result on one's choosing.
More Documentation: coming, once I'm pretty sure all the functionality is stable (didn't want to have to be revising it in lockstep, as I thought up better ways to implement it...)
Removed the code redundancies in the raising of errors (did not want to complicate the nesting of the traceback further, but that's alright...)
What would you want in place of sys.exit?? (I deemed it a critical error, so I presumed you would want the the code shut down...)
asyncio.get_event_loop().close() ??
Dependencies: why does the library depend on asyncpg
and asyncio
while it do just string formatting and nothing else?
Unit tests are good even on the beginning. They show how should I use and how I shouldn't use your library, and also show where are possible usage obstacles and how to use them easier. This part doesn't replace documentation, though
Documentation: the current documentation is really not enough even to try your code.
Application exit: throwing an exception is more than enough to say something is going wrong, it's not libraries' work to decide when and how application should be shut down.
asyncpg dependency: it prepares the statement, and passes through values to your fetch routines; rather than two calls for each, it's done with just one (no repetition; right?)
As for insufficient documentation (keeping in mind more to come...): it shows a COMPLETE working example! If a person can''t get it from that, I doubt they will ever...
@bitmodeler,
Libraries' work is just to prepare statement, but I'd prefer to control every step on my own, because layer which prepares the statement and arguments should be separated from each other, just because of security and architecture reasons (e.g. log statements, set connection parameters before an execution, execute many statements on the same conn and more). But this library now works like format
function will print out data to /dev/stdout
.
I see scope of such library is:
1) re-format statement with names to statement with numbers
2) provide a dictionary name to list of indices
and a simple wrapper around (I'd prefer to have an ability of my own implementations for an optimization)
3) no additional functions included (they are could be nice for a demo, but not required or dangerous for a production code)
I see things which are out of scope are done in this library:
1) creating a pool and a wrapper around run_until_complete
which are not required at all even for demo purposes
2) _exit (exit_code)
static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)
3) _key_err
static method which can be inlined as it just a wrapper above KeyError(message)
4) raising SyntaxError
which is reserved for syntax errors for python code. You should use ValueError
or make a subclass of it
5) some reasons to replace ReferenceError
with something else
6) calling asyncpg
. A programmer can do it himself.
7) absence of ability to just format values without a asyncpg call.
docs:
1) I see, there's just no syntax highlighting.
1) No comments in an example code highlighting features
1) There should be an example with custom types
1) why did you choose {{name}}
, but not {name}
which is more logical and pyformat
compatible?
BTW: I'm happy that you've written this anyway and want to shape it to a really usable code in production with tests and good documentation.
"creating a pool and a wrapper around run_until_complete which are not required at all even for demo purposes"
Those are just convenience helpers I added; one can opt to disregard them. I use them often (less wordy!) So you'll note it's a general purpose asyncpg "utility" module; not just the "prepared statement" module.
"_exit (exit_code) static method which is dangerous and should be avoided (it's programmer's task to decide when to stop a program. This also doesn't required even for demo purposes (and will break tests)"
Presently it's a pseudo-exit, added to allow a programmer to override it with the handling one chooses. Perhaps I call it "_ab_end" instead?
"_key_err static method which can be inlined as it just a wrapper above KeyError(message)"
A lot of controversy on methods for inlining in python; seeing as how this is not called often, I'll leave it be. Could use the "inline" module, but articles say it messes up cython (which is getting greater adoption now...) Or maybe I go back to introducing the minor repetition?
"raising SyntaxError which is reserved for syntax errors for python code. You should use ValueError or make a subclass of it"
SOLD! Will replace it. Open to a suggested replacement for ReferenceError. [I would rather shy away from making custom exceptions, because that will just be more idiosyncratic stuff a person has to know to use it...]
"why did you choose {{name}}, but not {name} which is more logical and pyformat compatible?"
It's a big django standard for template variables, and was less likely to be accidentally used for other purposes in one's fabrication of SQL queries.
Adding unit testing today; will make a temporary table as part of the unit test...
I feel this conversation has gone off topic.
?
We're providing a solution to the problem that the original topic was all about...
Providing a solution involves a single comment with a link to a library that can be used as a workaround.
Any further discussion about that workaround belongs in my opinion on that libraries issue tracker.
OK; signing off this thread. Anybody needing the named parameter solution can find it at the git link posted above. It's an add-on class that leverages off of the existing asyncpg codebase.
[As I say on the git: I think named parameters are much safer than positional ones, and in my opinion it should be the rule and not the exception...]
PS Thanks for the constructive suggestions, Arseny...
@samuelcolvin thank you for a notification
@bitmodeler Thank you for a discussion
By the way, Arseny: you were right! The query parsing needed to be decoupled from the statement generation, for reasons of thread efficiency (otherwise, the threads using prepared statements might get bottlenecked on one connection; accomplished that by using two classes...)
[Just wanted to credit Arseny for the good advice in this thread, Sam...]
pyformat could be simply converted to psql-format using defaultdict
:
import collections
import itertools
from typing import Any, Dict, Tuple, List
def pyformat2psql(query: str, named_args: Dict[str, Any]) -> Tuple[str, List[Any]]:
positional_generator = itertools.count(1)
positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
formatted_query = query % positional_map
positional_items = sorted(
positional_map.items(),
key=lambda item: int(item[1].replace('$', '')),
)
positional_args = [named_args[named_arg] for named_arg, _ in positional_items]
return formatted_query, positional_args
Example:
query = "SELECT * FROM user WHERE name = %(name)s, email = %(email)s"
named_args = {'name': 'John Cena', 'email': 'cena@example.org'}
new_query, positional_args = pyformat2psql(query, named_args)
print(new_query) # prints SELECT * FROM user WHERE name = $1, email = $2
print(positional_args) # prints ['John Cena', 'cena@example.org']
Is there any sql alchemy core implementation for asyncpg? I don't want to use raw queries
statement = (
insert(table_obj).
values(named_args)
)
Working code:
async def insert_named_parameter(self, table, fields=None, **kwargs):
fields = fields or {}
fields.update(kwargs)
items = list(fields.items())
keys, values = [x[0] for x in items], [x[1] for x in items]
placeholders = ['${}'.format(i) for i, _ in enumerate(values, 1)]
query = 'INSERT INTO {} ({}) VALUES ({})'.format(table, ', '.join(keys), ', '.join(placeholders))
return await self.connection.execute(query, *values)
async def insert_keyword(self, keyword_data):
if keyword_data:
keyword_data['update_timestamp'] = datetime.datetime.now()
row = await self.insert_named_parameter('keywords', keyword_data)
return row
else:
return None
But I want sqlalchemy version without raw query and table name.
SQLAlchemy 1.4 supports asyncpg natively: https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#asynchronous-io-support-for-core-and-orm
@sany2k8 there is also asyncpgsa : https://github.com/CanopyTax/asyncpgsa
Thank you @cordalace for the polished solution for developers who want optional named parameters! :+1: Should seriously be considered for inclusion as a core utility function, considering how popular this story is!
Also to be fair many of us do not want to introduce ORM bloat. Named parameters are a highly valuable (for many), yet tiny, feature.
That said, thank you for asyncpg. I'm astounded at the quality of this project! @elprans @1st1
@cordalace built upon your code above for a "many" variant, list of dicts... seems to work, but curious how you would implement!? Thank you for your code too!
def pyformat2psql_many(query: str, named_args: List[Dict[str, Any]]) -> Tuple[str, List[Any]]:
arg_items = []
positional_generator = itertools.count(1)
positional_map = collections.defaultdict(lambda: '${}'.format(next(positional_generator)))
formatted_query = query % positional_map
for arg_item in named_args:
positional_items = sorted(
positional_map.items(),
key=lambda item: int(item[1].replace('$', '')),
)
positional_args = [arg_item[named_arg] for named_arg, _ in positional_items]
arg_items.append(positional_args)
return formatted_query, arg_items
Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:
# Serial Dict
#############
class Serial(dict):
def __getitem__(self, key):
return f"${list(self.keys()).index(key) + 1}"
# Tests
#######
parameters = Serial(foo='FOO', bar='BAR', baz='BAZ')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $1 from $2 where baz=$3" # Ordered Parameters
parameters = Serial(baz='BAZ', foo='FOO', bar='BAR')
query = "select {foo} from {bar} where baz={baz}".format_map(parameters)
assert query == "select $2 from $3 where baz=$1" # Unordered Parameters
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
query = "select {foo} from {foo} where baz={baz}".format_map(parameters)
assert query == "select $1 from $1 where baz=$2" # Repeated Parameters
# Mocking Fetch
###############
def mock_fetch(query, *args):
assert query == "select $1 from $1 where baz=$2"
assert args == ('FOO', 'BAZ')
query = "select {foo} from {foo} where baz={baz}"
parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'})
mock_fetch(query.format_map(parameters), *parameters.values())
NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.
SQL function arguments can only be used as data values, not as identifiers.
https://www.postgresql.org/docs/current/xfunc-sql.html
Hope this might be helpful for some of you! :D
Wanted to handle auto-removal of arguments not present in the query, instead of throwing an error. Based on @tebanep's snippet.
Enjoy:
# Serial Dictionary.
class Serial(dict):
def __getitem__(self, key):
return f"${list(self.keys()).index(key) + 1}"
# Pyformat to psql format.
def pyformat2psql(query: str, args_dict: Dict[str, Any]) -> Tuple[str, List[Any]]:
# Remove args not present in query.
args_list = list(args_dict.keys())
for value in args_list:
if f"{{{value}}}" not in query:
args_dict.pop(value, None)
# Generate query with serial positions.
args = Serial(args_dict)
query_formatted = query.format_map(args)
args_formatted = list(args.values())
return query_formatted, args_formatted
Usage
# Will just ignore 'foo' instead of throwing error.
query, params = pyformat2psql("SELECT * FROM users where name={baz}", {'foo': 'bob2', 'baz': 'bob'})
result = await conn.fetch(query, *params)
Considering Python dictionaries are now ordered and that the format_map method doesn't create a new dictionary on every call, like format does, one could use a Serial dict to solve this problem:
# Serial Dict ############# class Serial(dict): def __getitem__(self, key): return f"${list(self.keys()).index(key) + 1}" # Tests ####### parameters = Serial(foo='FOO', bar='BAR', baz='BAZ') query = "select {foo} from {bar} where baz={baz}".format_map(parameters) assert query == "select $1 from $2 where baz=$3" # Ordered Parameters parameters = Serial(baz='BAZ', foo='FOO', bar='BAR') query = "select {foo} from {bar} where baz={baz}".format_map(parameters) assert query == "select $2 from $3 where baz=$1" # Unordered Parameters parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'}) query = "select {foo} from {foo} where baz={baz}".format_map(parameters) assert query == "select $1 from $1 where baz=$2" # Repeated Parameters # Mocking Fetch ############### def mock_fetch(query, *args): assert query == "select $1 from $1 where baz=$2" assert args == ('FOO', 'BAZ') query = "select {foo} from {foo} where baz={baz}" parameters = Serial({'foo': 'FOO', 'baz': 'BAZ'}) mock_fetch(query.format_map(parameters), *parameters.values())
NOTE: The examples here are not valid PostgreSQL statements. These were used just to demonstrate the usage of format_map to create serial numeric placeholders based on a dictionary.
SQL function arguments can only be used as data values, not as identifiers.
https://www.postgresql.org/docs/current/xfunc-sql.html
Hope this might be helpful for some of you! :D
This is almost perfect, but what about an .executemany() variant? :smile:
It looks like for prepared statements, statements have to be in the form
'select $1 from $2 where baz=$3
with the arguments('foo', 'bar', 'baz')
It is common in python database api's to support the pyformat int PEP 249. http://legacy.python.org/dev/peps/pep-0249/#paramstyle
Can this support at least one of the param styles in the pep? Preferably pyformat.
Ideally, the parameters could then be passed in as a dictionary instead of a ordered list.
conn.fetch(myquery, {bar='bar', foo='foo'})