INWTlab / dbrequests

python package built for easy use of raw SQL within python and pandas projects
MIT License
5 stars 2 forks source link

Regular expressions in conflict with .format() parameter resolution #45

Open phainom opened 4 years ago

phainom commented 4 years ago

I have an issue when using regular expressions in SQL, eg via similar to and {}, because the format() resolution thinks this points towards a parameter. There should at least be an option to pass the raw sql in Query. Currently I circumvent the issue with using db.query instead of db.send_query, which then needs separate boilerplate code for passing the right query.

wahani commented 3 years ago

@phainom is it possible to resolve this if you escape { used in the regex? E.g. "{{{a},}}".format(a="a") works for me. See https://docs.python.org/3.3/library/string.html#format-string-syntax Not sure if this resolves the issue. If you provide an example I can see if I find a fix.

phainom commented 3 years ago

Say the query in the .sql file is something like

select 'hello world' as col1 where 'aa' similar to 'a{1,2}'

This will work in raw sql, but throw an error when used with dbrequests, since it will look for a parameter. What do you mean by escaping in that case? Escaping the { will lead to a sql error or a different meaning of the regex.

btw, we should move from .format to f-strings for better readability (https://www.python.org/dev/peps/pep-0498/).

Cattes commented 3 years ago

What you can do to partially fill a string containing curly brackets is to set double brackets.

"select 'hello world' as {col} where 'aa' similar to 'a{{1,2}}'".format(col="col1")
# returns "select 'hello world' as col1 where 'aa' similar to 'a{1,2}'"
wahani commented 3 years ago

If I have:

query.sql

SELECT {val} regexp "a{{1,2}}";

So instead of {1,2} we write {{1,2}}. Then I can execute:

db.send_query(
    dbrequests.Query("query.sql", val="'aa'", sql_dir="../").text
)

grafik

and

grafik

BTW: I just noticed that format is only executed if the query lives in a file, not if I pass a query as string.

phainom commented 3 years ago

Got it, you're right, I can introduce double curly brackets as soon as I call the sql from dbrequests. What I meant is that I cannot call a fully functioning sql via dbrequests, but have to break the sql to make it work (even for a query where nothing is parametrized). This breaks the most basic feature of the package, namely "Easy sending of raw sql and output as pandas DataFrames". I don't know if there is an intuitive solution for this though, we would probably need to check the passed kwargs before reformatting the query.

phainom commented 3 years ago

BTW: I just noticed that format is only executed if the query lives in a file, not if I pass a query as string.

I would like to view that as a feature rather than a bug. :grin:

wahani commented 3 years ago

What I meant is that I cannot call a fully functioning sql via dbrequests, but have to break the sql to make it work (even for a query where nothing is parametrized). This breaks the most basic feature of the package, namely "Easy sending of raw sql and output as pandas DataFrames".

I agree. We just have an implementation detail - the use of format - bubbling up here.

I don't know if there is an intuitive solution for this though, we would probably need to check the passed kwargs before reformatting the query.

I see two options:

phainom commented 3 years ago

like you say, we try to check in advance for keywords. I find this difficult, because format allows also for function calls, e.g. {len(arg)} is a legal expression. So it's not going to be easy unless python offers something to help us I don't see yet.

I would argue that writing a .sql with {len(arg)} instead of calling len when passing arg in send_query is bad practice, and we could think about not allowing this. We would need to call locals(), compare with the output of string.Formatter.parse() and auto-escape brackets which don't match passed arguments. This also has the advantage of being able to throw a more indicative error when passing non-matching keyword arguments. But I also agree that this is probably overly complicated and prone to errors due to edge cases. Switching to a different system will break all code which uses the {parameter} format, if we do this we could just move to a different package.

If we don't like these options, I can continue to circumvent the problem and we resolve this issue with wontfix.