JacquelineCasey / Nom

0 stars 0 forks source link

Semicolons and blocks #19

Open JacquelineCasey opened 1 year ago

JacquelineCasey commented 1 year ago

Right now, blocks require semicolons in all cases. Update the grammar so that a block can exist without a semicolon. This should generate the same expression statement as the version without a semicolon.

Note - due to the "no ambiguity" rule in the parser, this means disallowing empty expressions (i.e. extra semicolons). However, extra semicolons are weird, so it really doesn't matter.

JacquelineCasey commented 1 year ago

Maybe this should be postponed. This is harder than it looks. I've poked around the Rust documentation and it looks like they admit to using an ambiguous grammar here. Of course, they have some rule that prevents it from being truly ambiguous, but the grammar itself does have ambiguity.

JacquelineCasey commented 1 year ago

See also: https://doc.rust-lang.org/reference/statements.html?highlight=ambigu#expression-statements

JacquelineCasey commented 1 year ago

The problem is this bit of grammar:

BlockExpression
    : _LeftCurlyBrace (Statement)* (Expression)? _RightCurlyBrace
    ;

If we update the Statement rule to include a "Blocky expression without semicolon" option, then we are fine up until the situation where the final expression of the block is a blocky expression. Then we can't tell between the case where it is a statement and it is an expression.

The first step to fully resolving this is having Parsley handle ambiguous parses. This also requires doing eager merging, which is possible (I originally intended to do it, until I realized that I could get pretty far with an unambiguous grammar).

JacquelineCasey commented 1 year ago

Then, assuming Parsley prefers the final statement parse over the expression parse, we could update AST creation to move the final statement over to the expression slot if it happens to be a BlockyExpression. (If Parsley makes the other choice, we are free!).

So really, Parsley can make either choice, we just have to update it in order to actually make said choice.

JacquelineCasey commented 1 year ago

Awaiting https://github.com/jackcasey067/Parsley/issues/3.

This is now a low priority issue - we can survive a few extra semicolons for the time being.

JacquelineCasey commented 1 year ago

Being an expressional language is kinda hard I guess...

Note that Zig solves this by require labelled breaks if you want the block to "return" a value.

JacquelineCasey commented 1 year ago

Alternate syntaxes could work too.

For instance, perhaps we require blocky expressions to be wrapped in parentheses. Blocky expressions aren't actually used that often, instead they are usually used as statements. Ternaries end up being super nested

(if a { ... } else { ... } + 100)

This might really hurt if you follow my pattern of assigning variables out of if / match blocks.

Alternatively, we could require every block end with a value, likely the unit type. This kinda sucks too, probably a bit more than the other options.

Zig's approach is to label all blocks you return out of. This hurts a little, but at least it's not every block or every ternary (most if else expressions used as ternaries can use the expressions instead of blocks).

Despite these options (an more exotic ones that I am thinking up), I think it is probably best just to wait to update Parsley, and accept the possible runtime cost of grammar ambiguity.

JacquelineCasey commented 1 year ago

Note - the downside to Parsley accepting ambiguity - we parse whatever is inside the block twice! This only occurs wherever the block is used as a final expression.

Well, no, not really. This already happens! We start parsing one way, assuming we will eventually find a semicolon, and parsing another way, assuming we don't! Parsley is really not all that efficient, is it... Oh well, it seems to work decently so far...

JacquelineCasey commented 1 year ago

Swapped out the algorithm in Parsley. We might be a lot closer to this now.

JacquelineCasey commented 10 months ago

I am thinking a bit more about this. I've recently begun to pick up Go, which does this thing where it puts semicolons at the end of lines wherever a semicolon is possible (this is done before parsing at the lexical level). I am tempted, though I'll note that I also get annoyed by this because it forces else onto the same line as the closing brace of the preceding if. I might do this, but I would strongly limit it's scope - it should only occur after blocky expressions (i.e. closing braces), and it would look at the next token as well. It might also make sense if this only occurred at the end of lines (but maybe not, that may not be very helpful anyhow).