andialbrecht / sqlparse

A non-validating SQL parser module for Python
BSD 3-Clause "New" or "Revised" License
3.75k stars 699 forks source link

Grouping is haphazard and frequently wrong #60

Open andialbrecht opened 12 years ago

andialbrecht commented 12 years ago

_This is a _shadow issue* for Issue 60 on Google Code (from which this project was moved). Added 2012-03-16T16:07:49.000Z by de...@pointerstop.ca. Labels: Type-Defect, Priority-Medium. Please make updates to the bug there.*

Original description

Consider the following (valid, Oracle) sql, from which I'm trying to extract table names.  Why on earth is it "grouping" single literals, or portions of a list of literals separated only by commas and whitespace (and not even including anything before or after "." separators)?  Why group WHEREs, but not a single other SQL clause?

>>> sql="""SELECT year, tunits, fsex, flen,  average_number
... FROM (
... SELECT y.year, y.tunits, l.fsex, l.flen,  ROUND(abun/y.tunits,2) average_number
... FROM (SELECT lf.year, lf.fsex, lf.flen,
...              ROUND(SUM(st.area/((41./1000.0/6080.2)*1.75/1000.0)*lf.avgstdclen/1000000.0)) abun
...         FROM nwags.gsslf_mv lf, nwags.gsstratum st
...        WHERE lf.strat=st.strat
...        AND lf.strat in (SELECT DISTINCT strat FROM groundfish.gsmgt WHERE unit IN ('ALL'))
...          AND lf.spec in (SELECT research FROM groundfish.species_codes WHERE common = :common_name)
...          AND lf.series=:series
...          AND lf.year >= :start_year
...          AND lf.year <= :end_year
...        GROUP BY year, lf.fsex, lf.flen) l,
...      (SELECT i.year, SUM(ROUND(floor(s.area/(1.75*(41/6080.2))+.5))) tunits
...         FROM nwags.gssinf_mv i, nwags.gsstratum s
...        WHERE i.strat=s.strat AND i.strat in (SELECT DISTINCT strat FROM mflib.gsmgt WHERE unit IN ('ALL'))
...          AND i.series = :series
...          AND i.year >= :start_year
...          AND i.year <= :end_year
...        GROUP BY i.year) y
... WHERE y.year=l.year
... UNION
... SELECT 0, 0, 0, flen, 0
... FROM mflib.gs_lengths l, nwags.gsspec_mv r, nwags.gsspec s
... WHERE s.spec=r.spec AND l.class=s.lgrp AND l.flen BETWEEN r.lmin and r.lmax
... AND s.spec in (SELECT research
... FROM groundfish.species_codes WHERE common = :common_name)
... ) WHERE year > 0
... ORDER BY 1, 2, 3, 4
... &quot;&quot;&quot;
>>> res = sqlparse.parse(sql)

and the following results:

>>> for x in  [ unicode(token)+'\n' for token in res[0].tokens if token.is_group()]: print x
... 
tunits, fsex, flen,  average_number

tunits, l

fsex, l

flen,  ROUND(abun/y.tunits,2)

fsex, lf

flen,
             ROUND(SUM(st.area/((41./1000.0/6080.2)*1.75/1000.0)*lf.avgstdclen/1000000.0)) abun

nwags.gsslf_mv lf, nwags.gsstratum st

WHERE lf.strat=st.strat
       AND lf.strat in (SELECT DISTINCT strat FROM groundfish.gsmgt WHERE unit IN ('ALL'))
         AND lf.spec in (SELECT research FROM groundfish.species_codes WHERE common = :common_name)
         AND lf.series=:series
         AND lf.year >= :start_year
         AND lf.year <= :end_year
       GROUP BY year, lf.fsex, lf.flen) l,
     (SELECT i.year, SUM(ROUND(floor(s.area/(1.75*(41/6080.2))+.5))) tunits
        FROM nwags.gssinf_mv i, nwags.gsstratum s
       WHERE i.strat=s.strat AND i.strat in (SELECT DISTINCT strat FROM mflib.gsmgt WHERE unit IN ('ALL'))
         AND i.series = :series
         AND i.year >= :start_year
         AND i.year <= :end_year
       GROUP BY i.year) y
WHERE y.year=l.year

0, 0, 0, flen, 0

mflib.gs_lengths l, nwags.gsspec_mv r, nwags.gsspec s

WHERE s.spec=r.spec AND l.class=s.lgrp AND l.flen BETWEEN r.lmin and r.lmax
AND s.spec in (SELECT research
FROM groundfish.species_codes WHERE common = :common_name)
) WHERE year > 0

1, 2, 3, 4
petergruenbeck commented 6 years ago

Here is a simple SQL where I believe its grouping the tablename incorrectly with the columns and calling it a function.

sql = "Insert into foo (a,b,c) values (1, 'b', zz)"

the "foo (a,b,c) ends up being grouped and the token ttype for 'foo' is None rather than sqlparser.sql.Identifier

A variation on the above includes an alias for the table per the Oracle syntax charts: sql = "Insert into foo fooalias (a,b,c) values (1, 'b', zz)"

I this case the token object for foo is correctly an sqlparser.sql.Identifier type, but the ttype is still set to None so the Match function does not work as expected.