More Grammar Updates [2]

AtomCrafty commented 2 years ago

The more I work with the grammar, the more issues I come across. Here's round two.

4. Rewrite of the switch statement grammar

Switch statements currently require at least one case or default, which is an unnecessary restriction. Also the rule responsible for parsing case and default sections is called LabeledStatement even though it is neither labeled nor a statement. I would like to rename it to SwitchSection, and have two separate subclasses CaseSection and DefaultSection. The items list will still be available on an abstract SwitchSection, but only CaseSection will have a condition field.

5. Split "Jump" statements

The grammar rule JumpStatement is currently responsible for parsing continue, break and return statements. Instead of differentiating them by the type field, each of these should become a separate class and only ReturnStatement should have a value field. An open question would be whether the JumpStatement should remain as an abstract super class or whether we should get rid of it altogether. In my opinion a "return" isn't more of a jump than an if statement would be, so it is a bit of a misnomer.

6. Dissolve `BlockItem`

The BlockItem abstraction only exists to allow both statements and declarations. But I would argue that the distinction between the two isn't useful to us at all. I therefore propose to simply decide that each Declaration is in fact a Statement. The only purpose of the distinction I am aware of is to disallow declarations as a direct descendant of container statements that don't introduce a new scope, for example the if statement:

if(true)
    int x = 5;

This however should be implemented as a semantic rule instead of a syntactic one. "Declaration is not allowed within an if statement" is a much more useful error message than "No viable alternative at input 'int'".

Some people might argue that a declaration isn't a statement, in that case we could introduce a new DeclarationStatement, which acts as a wrapper around a Declaration. I believe this is the way Clang handles it. This wouldn't be such a bad alternative anyway, because it would allow us to exclude the final semicolon from the Declaration and properly use it within the ForLoop rule. Currently the first semicolon of a ForLoop really belongs to the nested Declaration node, which is suboptimal.

As always, each rewrite will bring with it changes to various field names. The current grammar reads like we're trying to save every last byte, so words like "condition" or "statement" are unnecessarily shortened to "cond" and "stmt". This is just unpleasant to read, so I will rename them appropriately. Also once 6. is implemented, all of the items lists should be renamed to statements or something like that.

@eyck since I know that name changes will be a major hassle for you, here is a simple tip to avoid 90% of the manual work: Before running the mwe2 workflow, open the affected class in src-gen/com.monres.coredsl.coreDsl and refactor->rename the affected getter functions. For example when I rename IfStatement.cond to condition, you'd just need to let eclipse rename the getCond function to condition. Make sure to omit the get part of the name, because otherwise eclipse will be confused. I will make sure to include a full list of all name changes with each pull request.

jopperm commented 2 years ago

Sounds uncontroversial to me.
In favour of separate classes. I don't think we need an abstract super class like JumpStatement. ~~In any case, "jump" feels like a misnomer for a construct in a high-level language, and more suited at ASM-level.~~ Turns about, grouping goto, continue, break and return as jump statements goes back to the C spec.
I think I'd prefer a separate Declaration and the DeclarationStatement wrapper. Would that allow Declaration to actually be a superclass for all declarations, e.g. including formal parameters?

AtomCrafty commented 2 years ago

It is possible to force ParameterDeclaration to inherit from Declaration by explicitly specifying the return type of the relevant parser rule like this:

ParameterDeclaration returns Declaration: {ParameterDeclaration} type=TypeSpecifier declarators+=Declarator?;

I'm not sure whether this is actually useful though. It might make it slightly easier to resolve the type specifier starting from a given declarator node. But at the same time it means that ParameterDeclaration now has a list of declarators, even though that list always contains exactly one declarator.

Currently there are three possible locations for a Declarator node:

Declaration → InitDeclarator → Declarator
ParameterDeclaration → Declarator
StructDeclaration → Declarator (StructDeclaration should really be called StructMemberDeclaration)

If we made both ParameterDeclaration and StructDeclaration inherit from Declaration, then the type provider could use the parentOfType helper to get the declaration and easily access the type specifier from there.

Also I just realized another option would be to have Declaration be purely abstract and make the proposed DeclarationStatement inherit from it as well.

And yet another option would be to completely scrap all of the subclasses and only have one single Declaration class that is used in all of these places.

AtomCrafty commented 2 years ago

Looking at this again, do we really need the InitDeclarator to be a separate node? We could instead simply have the initializer field be present on all declarators, but only allow it to be populated within declaration statements. We can have two parser rules that both return the same type, but only one of them contains the initializer syntax.

If we get rid of InitDeclarator, the immediate parent of a declarator would always be its corresponding declaration, which would make the validation even simpler.

jopperm commented 2 years ago

If we get rid of InitDeclarator, the immediate parent of a declarator would always be its corresponding declaration, which would make the validation even simpler.

That SGTM!

Minres / CoreDSL