can not parse add multi columns

I did encounter a similiar issue, when parsing a creation statement.

create_sql = '''CREATE TABLE countries (
    id INTEGER PRIMARY KEY,
    code VARCHAR(3) UNIQUE NOT NULL,
    name VARCHAR(126) UNIQUE NOT NULL
    );'''

parenthesis = [*(sqlparse.parse(create_sql )[0])][6]

WS_TOKENS=(sqlparse.tokens.Token.Text.Whitespace,sqlparse.tokens.Token.Text.Whitespace.Newline,sqlparse.tokens.Token.Comment)

for n,t in enumerate(parenthesis):
    if not t.ttype in WS_TOKENS : 
        print('{0:>2}: is_group {1},\tclass {2:>14},\tttype {3:>12}\t, parent {4}[{5}]\t, "{6}"'.format(
            n,                                   ##Token Num in Parenthesis
            t.is_group,                          ##Is TokenGroup (e.g. IdentifierList)
            str(type(t)).split('.')[-1][:-2],    ##Classname (last part)
            str(t.ttype)[6:],                    ##Token ttype (if not None)
            str(type(t.parent)).split('.')[-1],  ##Parent, either the Parenthesis OR (if flattended) an Identifier List
            id(t.parent),                        ##Parents ID
            str(t)))                             ##SQL Str

returns:

 0: is_group False, class          Token,   ttype  Punctuation  , parent Parenthesis'>[2257639606576]   , "("
 3: is_group True,  class     Identifier,   ttype               , parent Parenthesis'>[2257639606576]   , "id"
 5: is_group False, class          Token,   ttype Name.Builtin  , parent Parenthesis'>[2257639606576]   , "INTEGER"
 7: is_group False, class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   , "PRIMARY"
9: is_group True,   class IdentifierList,   ttype               , parent Parenthesis'>[2257639606576]   , "KEY,
    code"
11: is_group True,  class       Function,   ttype               , parent Parenthesis'>[2257639606576]   , "VARCHAR(3)"
13: is_group False, class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   , "UNIQUE"
15: is_group True,  class IdentifierList,   ttype               , parent Parenthesis'>[2257639606576]   , "NOT NULL,
    name"
17: is_group True,  class       Function,   ttype               , parent Parenthesis'>[2257639606576]   , "VARCHAR(126)"
19: is_group False, class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   , "UNIQUE"
21: is_group False, class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   , "NOT NULL"
24: is_group False, class          Token,   ttype  Punctuation  , parent Parenthesis'>[2257639606576]   , ")"

See Line 9 & 15.

Keyword, (Whitspace), Comma, (Whitespace) and Name of the next identifier are grouped into one IdentifierList.

Using print('{0:>2}: "{6:<14}" ,is_group {1},\tclass {2:>14},\tttype {3:>12}\t, parent {4}[{5}]\t'.format(... (token # : SQL, is_group,...) I do see, that the IdentifierList is grouped around the line-break.

 7: "PRIMARY       " ,is_group False,   class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   
 9: "KEY,
    code " ,is_group True,  class IdentifierList,   ttype               , parent Parenthesis'>[2257639606576]   
11: "VARCHAR(3)    " ,is_group True,    class       Function,   ttype               , parent Parenthesis'>[2257639606576]

And

13: "UNIQUE        " ,is_group False,   class          Token,   ttype      Keyword  , parent Parenthesis'>[2257639606576]   
15: "NOT NULL,
    name" ,is_group True,   class IdentifierList,   ttype               , parent Parenthesis'>[2257639606576]   
17: "VARCHAR(126)  " ,is_group True,    class       Function,   ttype               , parent Parenthesis'>[2257639606576]

A flattened view (SQL and Parent first) shows that more explicitly

for n,t in enumerate(parenthesis.flatten()):
    if not t.ttype in WS_TOKENS : 
        print('{0:>2}: "{6:<10}", parent {4}[{5}],\tclass {2:>5},\tttype {3:>12}\t\t"'.format(
            n,                                   ##Token # in Parenthesis
            ....
            str(t)))                             ##SQL Str

shows, that Keywords, followed by a Comma and an Name become grouped into an Identifier List

 0: "(         ", parent Parenthesis'>[2257639606576],  class Token,    ttype  Punctuation      
...     
 7: "PRIMARY   ", parent Parenthesis'>[2257639606576],  class Token,    ttype      Keyword      
 9: "KEY       ", parent IdentifierList'>[2257640149664],   class Token,    ttype      Keyword      
10: ",         ", parent IdentifierList'>[2257640149664],   class Token,    ttype  Punctuation      
13: "code      ", parent Identifier'>[2257640149304],   class Token,    ttype         Name      
15: "VARCHAR   ", parent Identifier'>[2257640150264],   class Token,    ttype         Name      
16: "(         ", parent Parenthesis'>[2257639606696],  class Token,    ttype  Punctuation      
17: "3         ", parent Parenthesis'>[2257639606696],  class Token,    ttype Literal.Number.Integer        
18: ")         ", parent Parenthesis'>[2257639606696],  class Token,    ttype  Punctuation      
20: "UNIQUE    ", parent Parenthesis'>[2257639606576],  class Token,    ttype      Keyword      
22: "NOT NULL  ", parent IdentifierList'>[2257640149184],   class Token,    ttype      Keyword      
23: ",         ", parent IdentifierList'>[2257640149184],   class Token,    ttype  Punctuation      
26: "name      ", parent Identifier'>[2257640149784],   class Token,    ttype         Name      
28: "VARCHAR   ", parent Identifier'>[2257640151104],   class Token,    ttype         Name      
29: "(         ", parent Parenthesis'>[2257639606096],  class Token,    ttype  Punctuation      
30: "126       ", parent Parenthesis'>[2257639606096],  class Token,    ttype Literal.Number.Integer        
31: ")         ", parent Parenthesis'>[2257639606096],  class Token,    ttype  Punctuation      
33: "UNIQUE    ", parent Parenthesis'>[2257639606576],  class Token,    ttype      Keyword      
... 
38: ")         ", parent Parenthesis'>[2257639606576],  class Token,    ttype  Punctuation

I do believe the grouping around Commas does something strange, and I do believe this is the same as Bug #444. I'd assume this happens in engine/grouping.py when group_identifier_list (~line 330) is called as the last grouping function. But I don't understand the grouping yet, so I hope this makes at least a comprehensible description and a bump. [edit : typo in example]

This behaviour comes fromengine/grouping.py (currently line 224) :

',' is matched as a identifier-separator

'KEY' (this example) or 'DEFAULT' (first example) is matched through m_role as a valid previous token before ',' in an Identifier List.

def group_identifier_list(tlist):
m_role = T.Keyword, ('null', 'role')
...
def match(token):
    return token.match(T.Punctuation, ',')

def valid(token):
    return imt(token, i=sqlcls, m=m_role, t=ttypes)
...
valid_prev = valid_next = valid
    _group(tlist, sql.IdentifierList, match,
           valid_prev, valid_next, post, extend=True)
...

The included Keyword before the Comma, seems to be the issue during grouping for both cases. [edit: source, typo]

andialbrecht / sqlparse

can not parse add multi columns #467

this is my test sql

and print all tokens

the add keyword can not be parsed