andialbrecht / sqlparse

A non-validating SQL parser module for Python
BSD 3-Clause "New" or "Revised" License
3.76k stars 700 forks source link

recognize `->` as a single operator. #682

Closed mtagle closed 8 months ago

mtagle commented 2 years ago

I'm aware of at least 2 sql dialects that have -> ("arrow") operators. presto/athena use them as part of lambda expressions (https://prestodb.io/docs/current/functions/lambda.html) and postgres uses them to get things out of a json blob: https://www.postgresql.org/docs/9.4/functions-json.html

Right now, however, sqlparse does not seem to recognize -> as an operator so if you format sql with a -> in it with use_space_around_operators=True, you will end up breaking the arrow and making your sql invalid.

ie:

import sqlparse

sql = "select params->'name' from events"
formatted_sql = sqlparse.format(sql, use_space_around_operators=True)
print(formatted_sql)

will print:

select params - > 'name' from events

(note the - >) Obviously a workaround is to not use use_space_around_operators, but it would be nice if sqlparse were aware of this as an operator.

mrmasterplan commented 1 year ago

Opeartors are parsed here currently. https://github.com/andialbrecht/sqlparse/blob/master/sqlparse/keywords.py#L105 can you propose a regex that would parse your operator?

Julien00859 commented 1 year ago

We too at Odoo have the problem, we would like to use sqlparse to pretty-print SQL queries generated by our ORM but at the moment the situation is sub-optimal:

SELECT "ir_module_module"."id",
       "ir_module_module"."name",
       "ir_module_module"."category_id",
       "ir_module_module"."shortdesc"->>'en_US',
                                        "ir_module_module"."summary"->>'en_US',
                                                                       "ir_module_module"."description"->>'en_US',
                                                                                                          "ir_module_module"."author",
                                                                                                          "ir_module_module"."maintainer",
                                                                                                          "ir_module_module"."contributors",
                                                                                                          "ir_module_module"."website"
FROM "ir_module_module"
WHERE ("ir_module_module"."id" IN (...))  -- ids striped for github

We use JSON objects to store text indexed by lang (for i18n purpose), so we have pretty much those arrows in all our queries.

Xavier-Do commented 1 year ago

It looks like the last sqlparse.keywords.SQL_REGEX could be

(r'->>|->|([+/@#%^&|^-]+)', tokens.Operator)

Not sure that it is enough but it solves the issue for our use case. Still need to test it a little.

Here is what we are planning to do in our code base waiting for a fix

from sqlparse import keywords, tokens
import re

fixed_regex = r'->>|->|([+/@#%^&|^-]+)'
if callable(keywords.SQL_REGEX[-1][0]):  # sqlparse  version 0.4.1
    FLAGS = re.IGNORECASE | re.UNICODE
    keywords.SQL_REGEX[-1] = (re.compile(fixed_regex, FLAGS).match, tokens.Operator)
else:  # curent sqlparse master (7fdb2da82d51a9a02baaefb5c7fe5cbbaac4329e)
    keywords.SQL_REGEX[-1] = (fixed_regex, tokens.Operator)