Open kfsone opened 3 years ago
Good question :)
It actually was considered to add a zero or more
type operator to the parse production rules, but this made it more complicated to specify the syntax direction translation rules, so for the sake of simplicity this was left out.
Aye, in the process of converting my grammar over from my homebrew parser, I found the absence of square-bracket a little more frustrating. It felt like something that ought to be feasible as syntactic sugar... I.e writing:
Element: Value [ "," ];
could be treated as
Element: Value "," | Value;
This is a great example of where the SDT rules would be awkward. Try to include SDT rules in your examples, maybe I am wrong.
[edit: after reading the gocc2.bnf I'm guesisng 'sdt' specifically refers to the <<...>> directives; I'll write a follow-up]
Sure, something like this?
List: Element | List Element;
Element: Value "," | Value;
I'll try to swing back to this and look at the code so I can see if how I'm now thinking it might be implemented is feasible, but
R: P [ n ];
would effectively be internally mapped to
R: __R0 | __R1;
__R0: P n;
__R1: P;
// so my example
List: Element | List Element;
Element: Value "," | Value;
// becomes
List: Element | List Element;
Element: Value [ "," ];
// produces the same result as
List: Element | List Element;
Element: __Element0 | __Element1;
__Element0: Value ",";
__Element1: Value;
Pardon my oafishness - self-taught and aside from toy parsers/compilers for small dsls I haven't worked on a real parser in anger since I wrote a mud language+engine where the compiler produced an abstract grammar that the engine subsequently used to drive a bottom-up parser to interpret player input ('plant the big plant in the little plant pot and pot the little plant with the big potted plant' [spot the catch :)]).
After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:
ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>;
OptionalParent : ":" identifier | empty;
vs
ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>
If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as
ClassDef : "class" identifier __optional__Parent Body << ast.NewClass($1, $2, $3) >>;
__optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;
The precedent for this is "anonymous terminals", where gocc allows
ClassDef: "class" ...
instead of requiring
class_keyword: "class";
ClassDef: class_keyword identifier ...
I can see cases where a naive approach would cause problems:
// looking at you, Guido.
import : "import" [ identifier string_lit | string_lit "as" identifier ];
obvious but flawed workarounds:
or:
I'm guesisng 'sdt' specifically refers to the <<...>> directives
Yes exactly
After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:
ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>; OptionalParent : ":" identifier | empty;
vs
ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>
If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as
ClassDef : "class" identifier __optional__Parent Body << ast.NewClass($1, $2, $3) >>; __optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;
The precedent for this is "anonymous terminals", where gocc allows
ClassDef: "class" ...
instead of requiring
class_keyword: "class"; ClassDef: class_keyword identifier ...
I think this might work for the optional case, not sure about all implications, but at least SDT rules look nice.
I can see cases where a naive approach would cause problems:
// looking at you, Guido. import : "import" [ identifier string_lit | string_lit "as" identifier ];
obvious but flawed workarounds:
- pad the attrib count to match worst case, let the user figure the conext themselves: lots of surprises for beginners :(
- require each branch have the same attrib count: will have arbitrary usage feel and still surprise users with order of params,
or:
- disallow | in Lexical []s: it's a small but incredibly useful convenience for a lot of super-common cases, the effect on attribs is relatively predictable for learners.
Yes I think already |
is only allowed at the top level in the parser part of the bnf, so then this shouldn't be a problem.
Tripped myself on subtle difference between token and production definitions :)
->
which is the unquoted open brace:
Obviously, I need to use the alternate structure here, but I'm just curious if it wouldn't actually just make sense to have that effect achieved by introducing the use of '{' in productions anyway?