dabeaz / ply

Python Lex-Yacc
http://www.dabeaz.com/ply/index.html
2.79k stars 465 forks source link

Grammar Rule association #215

Closed NofelYaseen closed 5 years ago

NofelYaseen commented 5 years ago

Below is a simplified version of what I am exactly trying to do. The problem is equivalent as the original is too large. Anyway, if you run this, we see the output is:

expressions a expressions b expressions c expressions bc expressions abc

The problem I am having is that it is going in "bc" and then "abc". Is it possible to go in "ab" and then "abc".

I expected there would be a simple way to go using other way, but I couldn't find out. I am looking for parser precedence and associativity. The only thing I found here was token precedence and associativity.

import ply.lex as lex
import ply.yacc as yacc

tokens = ['NAME']

t_NAME = r'[a-zA-Z]'

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

lexer = lex.lex()
lexer.input("abc")

while True:
    tok = lexer.token()
    if not tok: 
        break
    # print(tok.type, tok.value, tok.lineno, tok.lexpos)

def p_expressions(p):
    '''
    expressions : character    
                | expressions expressions      
    '''
    if len(p) == 2:
        p[0] = p[1]
    else:
        p[0] = p[1] + p[2]

    print('expressions', p[0])

def p_character(p):
    '''
    character : NAME
    '''
    p[0] = p[1]

def p_error(token):
        if token:
            print('Syntax error at token:', token.value, ' token type:', token.type, ' at line no:', token.lineno, ' lex pos:', token.lexpos)
        else:
            print("Syntax error at EOF")

parser = yacc.yacc()
parser.parse("abc")

I am aware it is possible to update it to

def p_expressions(p):
    '''
    expressions : character    
                | expressions character     
    '''

But I am hoping there is another way as things are not simple in the original problem. I need to combine two different expressions together.

NofelYaseen commented 5 years ago

I found a solution, but it seems a bit hacky. It creates a precedence of an imaginary token and works (at least for me).

precedence = (
    ('left', 'NAME'),
    ('right', 'IMAGINE')
    )

def p_expressions(p):
    '''
    expressions : character    
                | expressions expressions %prec IMAGINE
    '''
    if len(p) == 2:
        p[0] = p[1]
    else:
        p[0] = p[1] + p[2]

    print('expressions', p[0])

Close the issue, if there is no other way.

dabeaz commented 5 years ago

I've been a bit sidetracked (sorry for delayed response). Use of %prec is a known technique for resolving ambiguous grammar rules in LALR(1) parser generators like yacc/bison/ply, etc. The alternative would be to rewrite the grammar in some way to avoid ambiguity.