PyCQA / baron

IDE allow you to refactor code, Baron allows you to write refactoring code.
http://baron.pycqa.org
GNU Lesser General Public License v3.0
289 stars 50 forks source link

Wrong operator ordering in FST for operators of equal precedence #162

Open EmilyBourne opened 4 years ago

EmilyBourne commented 4 years ago

When a modulo operator is combined with a multiplication baron does not interpret the code in the same way as python.

E.g:

a % 2 * 4

In python this is equivalent to:

(a % 2) * 4

however baron reads it as:

a % (2 * 4)

as shown here:

>from baron import parse

> fst = parse("a % 2 * 4")

> fst[0]

{'first': {'type': 'name', 'value': 'a'},
 'first_formatting': [{'type': 'space', 'value': ' '}],
 'second': {'first': {'section': 'number', 'type': 'int', 'value': '2'},
  'first_formatting': [{'type': 'space', 'value': ' '}],
  'second': {'section': 'number', 'type': 'int', 'value': '4'},
  'second_formatting': [{'type': 'space', 'value': ' '}],
  'type': 'binary_operator',
  'value': '*'},
 'second_formatting': [{'type': 'space', 'value': ' '}],
 'type': 'binary_operator',
 'value': '%'}
EmilyBourne commented 4 years ago

This is also true for division and multiplication:

a / 2 * 4

gives:

{'first': {'type': 'name', 'value': 'a'},
 'first_formatting': [{'type': 'space', 'value': ' '}],
 'second': {'first': {'section': 'number', 'type': 'int', 'value': '2'},
  'first_formatting': [{'type': 'space', 'value': ' '}],
  'second': {'section': 'number', 'type': 'int', 'value': '4'},
  'second_formatting': [{'type': 'space', 'value': ' '}],
  'type': 'binary_operator',
  'value': '*'},
 'second_formatting': [{'type': 'space', 'value': ' '}],
 'type': 'binary_operator',
 'value': '/'}
EmilyBourne commented 4 years ago

Operators of equal precedence should be parsed in reverse order

EmilyBourne commented 4 years ago

This is happening due to the precedence specified for rply in grammator_operators.py

On lines 191-214:

@pg.production("expr : xor_expr VBAR expr")
    @pg.production("xor_expr : and_expr CIRCUMFLEX xor_expr")
    @pg.production("and_expr : shift_expr AMPER and_expr")
    @pg.production("shift_expr : arith_expr RIGHT_SHIFT shift_expr")
    @pg.production("shift_expr : arith_expr LEFT_SHIFT shift_expr")
    @pg.production("arith_expr : term PLUS arith_expr")
    @pg.production("arith_expr : term MINUS arith_expr")
    @pg.production("term : factor STAR term")
    @pg.production("term : factor SLASH term")
    @pg.production("term : factor PERCENT term")
    @pg.production("term : factor DOUBLE_SLASH term")
    @pg.production("term : factor AT term")
    @pg.production("power : atom DOUBLE_STAR factor")
    @pg.production("power : atom DOUBLE_STAR power")
    def binary_operator_node(pack):
        (first, operator, second) = pack
        return {
            "type": "binary_operator",
            "value": operator.value,
            "first": first,
            "second": second,
            "first_formatting": operator.hidden_tokens_before,
            "second_formatting": operator.hidden_tokens_after
        }

This syntax means that equivalent terms are saved on the right hand side. Thus for the following expression (test_combine_div_modulo_mult in test_grammator_operators.py):

"a/b%c*d"

The operator / is considered first. The term "b%c*d" returns a term so it is saved in the right hand side.

Reversing the syntax to:

    @pg.production("expr : expr VBAR xor_expr")
    @pg.production("xor_expr : xor_expr CIRCUMFLEX and_expr")
    @pg.production("and_expr : and_expr AMPER shift_expr")
    @pg.production("shift_expr : shift_expr RIGHT_SHIFT arith_expr")
    @pg.production("shift_expr : shift_expr LEFT_SHIFT arith_expr")
    @pg.production("arith_expr : arith_expr PLUS term")
    @pg.production("arith_expr : arith_expr MINUS term")
    @pg.production("term : term STAR factor")
    @pg.production("term : term SLASH factor")
    @pg.production("term : term PERCENT factor")
    @pg.production("term : term DOUBLE_SLASH factor")
    @pg.production("term : term AT factor")
    @pg.production("power : factor DOUBLE_STAR atom")
    @pg.production("power : power DOUBLE_STAR atom")
    def binary_operator_node(pack):
        (first, operator, second) = pack
        return {
            "type": "binary_operator",
            "value": operator.value,
            "first": first,
            "second": second,
            "first_formatting": operator.hidden_tokens_before,
            "second_formatting": operator.hidden_tokens_after
        }

leads to the tree being constructed correctly. In this case:

a % 2 * 4

gives:

{'first': {'first': {'type': 'name', 'value': 'a'},
  'first_formatting': [{'type': 'space', 'value': ' '}],
  'second': {'section': 'number', 'type': 'int', 'value': '2'},
  'second_formatting': [{'type': 'space', 'value': ' '}],
  'type': 'binary_operator',
  'value': '%'},
 'first_formatting': [{'type': 'space', 'value': ' '}],
 'second': {'section': 'number', 'type': 'int', 'value': '4'},
 'second_formatting': [{'type': 'space', 'value': ' '}],
 'type': 'binary_operator',
 'value': '*'}
EmilyBourne commented 4 years ago

The ordering only needs to be changed for non-commutative operators (i.e. all operators but *). This allows the intuitive ordering to be kept for an example such as:

a * b * c

where first is a and second is a*b

EmilyBourne commented 4 years ago

The ordering only needs to be changed for non-commutative operators (i.e. all operators but *). This allows the intuitive ordering to be kept for an example such as:

a * b * c

where first is a and second is a*b

Unfortunately this is not possible. Objects of equal precedence must be examined in the same direction