erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.82k stars 127 forks source link

Inconsistent and undocumented production of Node vs list for children #247

Open alexchandel opened 1 month ago

alexchandel commented 1 month ago

Parsimonious seems to arbitrarily produce a Node vs a list of Nodes for some children of a rule via some undocumented rule. Consider the grammar fragment:

r"""
definition_list   = ((comment ws)? definition ws)+
"""

First, in visit_definition_list(self, node, visited_children), visited_children will be a list. This is not documented anywhere, but ok.

It is also a list of lists; again, not explicitly documented anywhere whether or why this is the case, but ok.

But the real inconsistency is visited_children[0][0]. If the optional match doesn't match, then this is a <Node matching ""> (like Node(<Quantifier (comment ws)?>, s, 26, 26)). But if the optional does match, then it's a 1-element list. Why? This is documented nowhere.

And this 1-element list itself contains a list (presumably of the parenthesized group elements) (meaning type(visited_children[0][0][0]) == list). Again, why? This isn't documented.

Even worse, the list elements visited_children[0][0][0][0] and visited_children[0][0][0][1] that correspond to comment and ws (which one might naively expect to be Nodes, given that the empty optional match is a Node) are instead also lists. Perhaps this is the default behavior for rules, but again, this is not documented anywhere, and the documentation strongly implies that a rule match becomes a Node.

Please document the precise behavior for how the types of the intermediate children of a node are determined.

alexchandel commented 1 month ago

This unpredictable alternation between lists and Nodes for optional matches makes visiting annoying, as instead of either testing for list length (if it were always a list) or testing for children (if it were always a node), instead we must do something less obvious and efficient.