erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.8k stars 126 forks source link

Is it possible to define the context of the token? #170

Closed kranonetka closed 3 years ago

kranonetka commented 3 years ago

For example got grammar (test.grammar file):

query               = statement ('; ' statement)*
statement           = create_user_stmt / drop_database_stmt
create_user_stmt    = 'CREATE USER ' user_name
drop_database_stmt  = 'DROP DATABASE ' db_name
user_name           = identifier
db_name             = identifier
identifier          = unquoted_identifier / quoted_identifier
unquoted_identifier = letter (letter / digit)*
quoted_identifier   = '"' letter+ '"'
letter              = ascii_letter / '_'
digit               = ~'[0-9]'
ascii_letter        = ~'[a-z]'i

And I want parse queries with this grammar:

from parsimonious.grammar import Grammar

with open('test.grammar', 'r') as fp:
    grammar = Grammar(fp.read())

if __name__ == '__main__':
    query = 'CREATE USER "john_doe"; DROP DATABASE superdb'

    tree = grammar.parse(query)
    print(tree)

And if you look at the tree nodes, then there are no user_name or db_name nodes, only identifier. Is this behavior normal? Is there a way to separate the user_name from the db_name name when parsing using this grammar?

pzhlkj6612 commented 3 years ago

Hi, please see the quick fix in https://github.com/erikrose/parsimonious/issues/131#issuecomment-353739447 .

kranonetka commented 3 years ago

@pzhlkj6612 Yep, it works, thanks! It would be great if this fix was inside the module, and not users had to make it themselves.