andialbrecht / sqlparse

A non-validating SQL parser module for Python
BSD 3-Clause "New" or "Revised" License
3.73k stars 695 forks source link

Window Functions are Broken row_number() over (PARTITION BY foo ORDER BY bar) #598

Open mkmoisen opened 3 years ago

mkmoisen commented 3 years ago
>>> import sqlparse
>>> sql = '''SELECT a, row_number() OVER (partition by foo ORDER BY bar), b, c, d, e, f FROM c'''
>>> print(sqlparse.format(sql, reindent=True))

This outputs the following broken indentation:

SELECT a,
       row_number() OVER (partition by foo
                          ORDER BY bar), b,
                                         c,
                                         d,
                                         e,
                                         f
FROM c

I think the bug is that two different identifier lists are created, one for [a, row_number()], and one for [b, c, d, e, f]. Instead, there should just be one identifier list containing [a, row_number() OVER (partition by foo ORDER BY bar), b, c, d, e, f].

I think the issue is that it parses row_number() and breaks when it encounters OVER (or maybe ORDER BY ?), so it cuts them up into two lists.

Perhaps this would be solved if you allowed a function to be parsed "name(" as well as "name(...) over ("

surelili commented 3 years ago

这个问题可以修改源码解决哦 运行结果如下: print(sqlparse.format(sql,reindent=True, keyword_case='upper')) image

JoepvandenHoven-Bluemine commented 2 years ago

This seems to boil down to the fact that 'OVER' is recognised as a keyword rather than an operator. I've been able to resolve this issue in my case by using a preprocesssor to change 'OVER' to an operator:

Example code

class FixWindowFunctions:
    def process(self, tlist):
        for ttype, value in tlist:
            if ttype == tokens.Keyword and value == 'OVER':
                yield (tokens.Operator, 'OVER')
            else:
                yield ttype, value

stack = engine.FilterStack()
stack.preprocess.append(FixWindowFunctions())
stack.enable_grouping()

stack.run('SELECT COUNT(*) OVER (ORDER BY SOMETHING) FROM TABLE')