This is a proposal to convert the F# grammar rules in the F# spec into a formal format (specifically, into ANTLR4, which is an EBNF format).
The F# spec uses today what is termed in the spec itself (§1.2) "a mixture of informal and semiformal techniques". The grammar rules in the spec are close to an EBNF notation, but informal. Turning them into a formal EBNF format opens the possibility of grammar validation. This is not possible today and therefore all the existing small mistakes and inconsistencies in the spec are not easy to find.
Why the ANTLR4 dialect? Because then we have (next to the value of having a formal grammar) also a tool available that brings validation into reach. (My initial investigation makes me determined to explore this further.)
Note that such a conversion will NOT create a larger distance between the spec and the compiler implementation. The compiler (or rather fsyacc) uses a BNF notation (like the whole yacc family). Therefore, between the spec grammar (in its current form as well as in ANTRL4 notation) and the fsyacc grammar there is always a transformation that implements repetitions recursively. Lets take an example from page 56 of the spec:
exprs:= expr, ..., expr.
In ANLTR4 this would be
exprs: expr (',' expr)*.
In the the compiler (pars.fsy) it is (roughly)
exprs: expr | exprs "," expr.
Note also that C# is using ANTLR format for their spec grammar.
I feel that such a change is doable with reasonable effort, that it has no negative impact (on readability, distance to the fsyacc grammar etc), but that the preciseness and the possibility of validation are very valuable. IMO we should do this before further extending the spec. I may have overlooked things, so I am looking forward to feedback.
This is a proposal to convert the F# grammar rules in the F# spec into a formal format (specifically, into ANTLR4, which is an EBNF format).
The F# spec uses today what is termed in the spec itself (§1.2) "a mixture of informal and semiformal techniques". The grammar rules in the spec are close to an EBNF notation, but informal. Turning them into a formal EBNF format opens the possibility of grammar validation. This is not possible today and therefore all the existing small mistakes and inconsistencies in the spec are not easy to find.
Why the ANTLR4 dialect? Because then we have (next to the value of having a formal grammar) also a tool available that brings validation into reach. (My initial investigation makes me determined to explore this further.)
Note that such a conversion will NOT create a larger distance between the spec and the compiler implementation. The compiler (or rather fsyacc) uses a BNF notation (like the whole yacc family). Therefore, between the spec grammar (in its current form as well as in ANTRL4 notation) and the fsyacc grammar there is always a transformation that implements repetitions recursively. Lets take an example from page 56 of the spec:
exprs:= expr, ..., expr
. In ANLTR4 this would beexprs: expr (',' expr)*
. In the the compiler (pars.fsy) it is (roughly)exprs: expr | exprs "," expr
.Note also that C# is using ANTLR format for their spec grammar.
I feel that such a change is doable with reasonable effort, that it has no negative impact (on readability, distance to the fsyacc grammar etc), but that the preciseness and the possibility of validation are very valuable. IMO we should do this before further extending the spec. I may have overlooked things, so I am looking forward to feedback.