goccmack / gocc

Parser / Scanner Generator
Other
622 stars 48 forks source link

Semantic action ignored for `empty` syntax bodies #24

Closed mewmew closed 8 years ago

mewmew commented 8 years ago

Example grammar empty.bnf:

Start
    : Foo Bar
;

Foo
    : empty  << "no foo", nil >>
    | "foo"  << "foo", nil >>
;

Bar
    : empty << "no bar", nil >>
    | "bar" << "bar", nil >>
;

The reduce functions of production table entities for empty syntax bodies always return nil, nil, no matter if a semantic action has been specified; e.g. https://github.com/mewspring/poc/blob/master/empty/parser/productionstable.go#L48

    ProdTabEntry{
        String: `Foo : empty    << "no foo", nil >>`,
        Id:         "Foo",
        NTType:     2,
        Index:      2,
        NumSymbols: 0,
        ReduceFunc: func(X []Attrib) (Attrib, error) {
            return nil, nil
        },
    },

Expected reduce function:

        ReduceFunc: func(X []Attrib) (Attrib, error) {
            return "no foo", nil
        },

Is there a reason for always returning nil, nil from productions with empty syntax bodies?

awalterschulze commented 8 years ago

@goccmack ?

awalterschulze commented 8 years ago

I think it might just be what he thought was a sensible default and didn't really think of a different use case, but I am guessing now.

goccmack commented 8 years ago

It has been several years since that design decision, but here is how I would think about it now. Gocc's semantic actions implement a syntax directed translation scheme. In other words, for every parsed syntactic construct you perform a corresponding translation action. Empty is the null production body, therefore there is no action associated with it.

I suppose it is a valid to ask if it makes sense to add a semantic action for empty productions. But one would have to think carefully about the implications. I am too far out of compilers at the moment to have an immediate opinion about it.

mewmew commented 8 years ago

Take this grammar for instance:

VarDecl
    : Immutable Type ident     << astx.NewVarDecl($0, $1, $2) >>
;

Type
    : "int"                    << types.Int, nil >>
;

Immutable
    : empty                    << false, nil >>
    | "const"                  << true, nil >>
;

As long as the grammar contains no LR-1 conflicts, empty productions, just like any other production, take part in describing deterministic aspects of the language. The absence of an element also gives valuable information, and for this reason, I would suggest allowing semantic actions also for empty productions.

Obviously the grammar above may easily be described as follows to alleviate the need for semantic actions on empty productions; e.g.

VarDecl
    : Type ident               << astx.NewVarDecl(false, $1, $2) >>
    : "const" Type ident       << astx.NewVarDecl(true, $1, $2) >>
;

Type
    : "int"                    << types.Int, nil >>
;

Immutable
    : empty                    << false, nil >>
    | "const"                  << true, nil >>
;

However, for more involved grammars, the above approach may not be desirable. An example from a grammar for LLVM IR is presented below.

GlobalVarDecl
    // Global variable definition.
    : global_ident "=" OptVarLinkage
      Immutable Type Value
      OptAlignment                                << irx.NewGlobalVar($1, $2, $3) >>
;

OptVarLinkage
    : empty
    | VarLinkage
;

VarLinkage
    : "appending"
    | "available_externally"
    | "common"
    | "internal"
    | "linkonce"
    | "linkonce_odr"
    | "private"
    | "weak"
    | "weak_odr"
;

Immutable
    : empty                                       << false, nil >>
    | "constant"                                  << true, nil >>
;

OptAlignment
    : empty
    | "," Alignment
;

Alignment
    : "align" int_lit
;

A semantic action on empty productions simplifies the grammar, as the GlobalVarDecl production rule only requires a single syntax body. Keeping duplicate syntax bodies up to date becomes more and more difficult as the language grammar grows more and more complex, as is the case for many languages that have been accumulating features over the years. Allowing semantic actions to be used for empty production may simplify the grammar by removing the need to duplicate production bodies.

What would be the main disadvantages of allowing semantic actions on empty productions?

As long as the grammar is deterministic, I see no risk of reduce-reduce conflicts.


Note, that the above example defined Immutable been defined as

Immutable
    : empty                                       << false, nil >>
    | "constant"                                  << true, nil >>
;

Rather than as seen in the true grammar for LLVM IR, simply to lay the foundations for discussions. The above points remain valid for language grammars such as C.

Immutable
    : "constant"                                  << true, nil >>
    | "global"                                    << false, nil >>
;
awalterschulze commented 8 years ago

Sounds like a compelling argument to change the mechanics. And it's backwards compatible :-) On 16 May 2016 2:02 PM, "Robin Eklind" notifications@github.com wrote:

Take this grammar for instance:

VarDecl : Immutable Type ident << astx.NewVarDecl($0, $1, $2) >> ;

Type : "int" << types.Int, nil >> ;

Immutable : empty << false, nil >> | "const" << true, nil >> ;

As long as the grammar contains no LR-1 conflicts, empty productions, just like any other production, take part in describing deterministic aspects of the language. The absence of an element also gives valuable information, and for this reason, I would suggest allowing semantic actions also for empty productions.

Obviously the grammar above may easily be described as follows to alleviate the need for semantic actions on empty productions; e.g.

VarDecl : Type ident << astx.NewVarDecl(false, $1, $2) >> : "const" Type ident << astx.NewVarDecl(true, $1, $2) >> ;

Type : "int" << types.Int, nil >> ;

Immutable : empty << false, nil >> | "const" << true, nil >> ;

However, for more involved grammars, the above approach may not be desirable. An example from a grammar for LLVM IR is presented below.

GlobalVarDecl // Global variable definition. : global_ident "=" OptVarLinkage Immutable Type Value OptAlignment << irx.NewGlobalVar($1, $2, $3) >> ;

OptVarLinkage : empty | VarLinkage ;

VarLinkage : "appending" | "available_externally" | "common" | "internal" | "linkonce" | "linkonce_odr" | "private" | "weak" | "weak_odr" ;

Immutable : empty << false, nil >> | "constant" << true, nil >> ;

OptAlignment : empty | "," Alignment ;

Alignment : "align" int_lit ;

A semantic action on empty productions simplifies the grammar, as the GlobalVarDecl production rule only requires a single syntax body. Keeping duplicate syntax bodies up to date becomes more and more difficult as the language grammar grows more and more complex, as is the case for many languages that have been accumulating features over the years. Allowing semantic actions to be used for empty production may simplify the grammar by removing the need to duplicate production bodies.

What would be the main disadvantages of allowing semantic actions on empty productions?

As long as the grammar is deterministic, I see no risk of reduce-reduce

conflicts.

Note, that the above example defined Immutable been defined as

Immutable : empty << false, nil >> | "constant" << true, nil >> ;

Rather than as seen in the true grammar for LLVM IR, simply to lay the foundations for discussions. The above points remain valid for language grammars such as C.

Immutable : "constant" << true, nil >> | "global" << false, nil >> ;

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/goccmack/gocc/issues/24#issuecomment-219409306

goccmack commented 8 years ago

I have no objection to giving it a try if @mewmew wants to make the change.

mewmew commented 8 years ago

After applying PR #25, the example grammar empty.bnf:

Start
    : Foo Bar
;

Foo
    : empty  << "no foo", nil >>
    | "foo"  << "foo", nil >>
;

Bar
    : empty << "no bar", nil >>
    | "bar" << "bar", nil >>
;

produces the following reduce functions of production table entities for empty an syntax body, https://github.com/mewspring/poc/blob/24da0316b28052c4d153b386adbeaba92065079c/empty/parser/productionstable.go#L48

    ProdTabEntry{
        String: `Foo : empty    << "no foo", nil >>`,
        Id:         "Foo",
        NTType:     2,
        Index:      2,
        NumSymbols: 0,
        ReduceFunc: func(X []Attrib) (Attrib, error) {
            return nil, nil
        },
    },

Example run with the debug_parser flag enabled.

u@x1 ~/D/g/s/g/m/p/empty> cat b.txt
bar
u@x1 ~/D/g/s/g/m/p/empty> empty b.txt
S0 bar(4,bar) reduce:2(Foo : empty  << "no foo", nil >>)
S2 bar(4,bar) shift:5
S5 $(1,) reduce:5(Bar : "bar"   << "bar", nil >>)
S4 $(1,) reduce:1(Start : Foo Bar   <<  >>)
S1 $(1,) accept(0)
no foo