Closed mewmew closed 8 years ago
@goccmack ?
I think it might just be what he thought was a sensible default and didn't really think of a different use case, but I am guessing now.
It has been several years since that design decision, but here is how I would think about it now. Gocc's semantic actions implement a syntax directed translation scheme. In other words, for every parsed syntactic construct you perform a corresponding translation action. Empty is the null production body, therefore there is no action associated with it.
I suppose it is a valid to ask if it makes sense to add a semantic action for empty productions. But one would have to think carefully about the implications. I am too far out of compilers at the moment to have an immediate opinion about it.
Take this grammar for instance:
VarDecl
: Immutable Type ident << astx.NewVarDecl($0, $1, $2) >>
;
Type
: "int" << types.Int, nil >>
;
Immutable
: empty << false, nil >>
| "const" << true, nil >>
;
As long as the grammar contains no LR-1 conflicts, empty
productions, just like any other production, take part in describing deterministic aspects of the language. The absence of an element also gives valuable information, and for this reason, I would suggest allowing semantic actions also for empty productions.
Obviously the grammar above may easily be described as follows to alleviate the need for semantic actions on empty productions; e.g.
VarDecl
: Type ident << astx.NewVarDecl(false, $1, $2) >>
: "const" Type ident << astx.NewVarDecl(true, $1, $2) >>
;
Type
: "int" << types.Int, nil >>
;
Immutable
: empty << false, nil >>
| "const" << true, nil >>
;
However, for more involved grammars, the above approach may not be desirable. An example from a grammar for LLVM IR is presented below.
GlobalVarDecl
// Global variable definition.
: global_ident "=" OptVarLinkage
Immutable Type Value
OptAlignment << irx.NewGlobalVar($1, $2, $3) >>
;
OptVarLinkage
: empty
| VarLinkage
;
VarLinkage
: "appending"
| "available_externally"
| "common"
| "internal"
| "linkonce"
| "linkonce_odr"
| "private"
| "weak"
| "weak_odr"
;
Immutable
: empty << false, nil >>
| "constant" << true, nil >>
;
OptAlignment
: empty
| "," Alignment
;
Alignment
: "align" int_lit
;
A semantic action on empty
productions simplifies the grammar, as the GlobalVarDecl
production rule only requires a single syntax body. Keeping duplicate syntax bodies up to date becomes more and more difficult as the language grammar grows more and more complex, as is the case for many languages that have been accumulating features over the years. Allowing semantic actions to be used for empty
production may simplify the grammar by removing the need to duplicate production bodies.
What would be the main disadvantages of allowing semantic actions on empty
productions?
As long as the grammar is deterministic, I see no risk of reduce-reduce conflicts.
Note, that the above example defined Immutable
been defined as
Immutable
: empty << false, nil >>
| "constant" << true, nil >>
;
Rather than as seen in the true grammar for LLVM IR, simply to lay the foundations for discussions. The above points remain valid for language grammars such as C.
Immutable
: "constant" << true, nil >>
| "global" << false, nil >>
;
Sounds like a compelling argument to change the mechanics. And it's backwards compatible :-) On 16 May 2016 2:02 PM, "Robin Eklind" notifications@github.com wrote:
Take this grammar for instance:
VarDecl : Immutable Type ident << astx.NewVarDecl($0, $1, $2) >> ;
Type : "int" << types.Int, nil >> ;
Immutable : empty << false, nil >> | "const" << true, nil >> ;
As long as the grammar contains no LR-1 conflicts, empty productions, just like any other production, take part in describing deterministic aspects of the language. The absence of an element also gives valuable information, and for this reason, I would suggest allowing semantic actions also for empty productions.
Obviously the grammar above may easily be described as follows to alleviate the need for semantic actions on empty productions; e.g.
VarDecl : Type ident << astx.NewVarDecl(false, $1, $2) >> : "const" Type ident << astx.NewVarDecl(true, $1, $2) >> ;
Type : "int" << types.Int, nil >> ;
Immutable : empty << false, nil >> | "const" << true, nil >> ;
However, for more involved grammars, the above approach may not be desirable. An example from a grammar for LLVM IR is presented below.
GlobalVarDecl // Global variable definition. : global_ident "=" OptVarLinkage Immutable Type Value OptAlignment << irx.NewGlobalVar($1, $2, $3) >> ;
OptVarLinkage : empty | VarLinkage ;
VarLinkage : "appending" | "available_externally" | "common" | "internal" | "linkonce" | "linkonce_odr" | "private" | "weak" | "weak_odr" ;
Immutable : empty << false, nil >> | "constant" << true, nil >> ;
OptAlignment : empty | "," Alignment ;
Alignment : "align" int_lit ;
A semantic action on empty productions simplifies the grammar, as the GlobalVarDecl production rule only requires a single syntax body. Keeping duplicate syntax bodies up to date becomes more and more difficult as the language grammar grows more and more complex, as is the case for many languages that have been accumulating features over the years. Allowing semantic actions to be used for empty production may simplify the grammar by removing the need to duplicate production bodies.
What would be the main disadvantages of allowing semantic actions on empty productions?
As long as the grammar is deterministic, I see no risk of reduce-reduce
conflicts.
Note, that the above example defined Immutable been defined as
Immutable : empty << false, nil >> | "constant" << true, nil >> ;
Rather than as seen in the true grammar for LLVM IR, simply to lay the foundations for discussions. The above points remain valid for language grammars such as C.
Immutable : "constant" << true, nil >> | "global" << false, nil >> ;
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/goccmack/gocc/issues/24#issuecomment-219409306
I have no objection to giving it a try if @mewmew wants to make the change.
After applying PR #25, the example grammar empty.bnf:
Start
: Foo Bar
;
Foo
: empty << "no foo", nil >>
| "foo" << "foo", nil >>
;
Bar
: empty << "no bar", nil >>
| "bar" << "bar", nil >>
;
produces the following reduce functions of production table entities for empty
an syntax body, https://github.com/mewspring/poc/blob/24da0316b28052c4d153b386adbeaba92065079c/empty/parser/productionstable.go#L48
ProdTabEntry{
String: `Foo : empty << "no foo", nil >>`,
Id: "Foo",
NTType: 2,
Index: 2,
NumSymbols: 0,
ReduceFunc: func(X []Attrib) (Attrib, error) {
return nil, nil
},
},
Example run with the debug_parser
flag enabled.
u@x1 ~/D/g/s/g/m/p/empty> cat b.txt
bar
u@x1 ~/D/g/s/g/m/p/empty> empty b.txt
S0 bar(4,bar) reduce:2(Foo : empty << "no foo", nil >>)
S2 bar(4,bar) shift:5
S5 $(1,) reduce:5(Bar : "bar" << "bar", nil >>)
S4 $(1,) reduce:1(Start : Foo Bar << >>)
S1 $(1,) accept(0)
no foo
Example grammar empty.bnf:
The reduce functions of production table entities for
empty
syntax bodies always returnnil, nil
, no matter if a semantic action has been specified; e.g. https://github.com/mewspring/poc/blob/master/empty/parser/productionstable.go#L48Expected reduce function:
Is there a reason for always returning
nil, nil
from productions withempty
syntax bodies?