BNFC / bnfc

BNF Converter
http://bnfc.digitalgrammars.com/
578 stars 163 forks source link

Optional Semicolons #447

Open ScottFreeCode opened 1 year ago

ScottFreeCode commented 1 year ago

Is it possible to create a syntax in LBNF in which, given a statement can have parts <one> and <two>, the following are all legal syntax?

<one>
<two>;
<one> <two>; <one> <two>;
<one> <two>
<one> <two>
<one>
<two>
<one>
<two>

Unlike layout, I do not have keywords at which open/close brackets can be inserted, and I do no necessarily want tab/alignment to be required for correct interpretation.

I simply want the parser to say, when it encounters a new line, something like:

  1. Is this a complete statement?
  2. And is the following non-empty line a complete statement?
  3. If yes, treat them as two statements.
  4. If no, combine them into one as you normally would if the newline were whitespace, and evaluate from there.

(Similar to, say, JavaScript's "semicolon insertion" rule. Semicolons are generally required only where the beginning of a statement could also be interpreted as the continuation of the previous statement, or to write statements on one line. But Haskell's or Python's whitespace-based layout is not used. Just a rule that a line break can end a complete statement. Maybe similar to shell script?)

I tried this grammar:

entrypoints [Statement];

--Statements . Statements ::= [Statement];

Statement . Statement ::= "hello" "world" OptionalSemicolon;

Semicolon . OptionalSemicolon ::= ";";

Blank . OptionalSemicolon ::= "\n";

terminator Statement "";

Which successfully makes semicolons optional! BUT it doesn't require a newline in the absence of a semicolon. It acts as though I had written "" instead of "\n". So e.g. this parses (which it should not, it should require a semicolon or a newline):

hello world hello world

And this:

hello
world
hello
world

…which should parse and then print back as this:

hello world
hello world

…does parse but instead prints back as this:

hello world hello world

(If I were to, say, add a semicolon at the end of either of these examples, it would still parse and either would print hello world hello world ; – The semicolon seems to work fine when it is present.)

I tried this modification (replacing the grammar line beginning with Blank):

Blank . OptionalSemicolon ::= Newline;

token Newline '\n';

But the effect is that the semicolon is required.

ScottFreeCode commented 1 year ago

This is clearly not a dealbreaker, I could go with mandatory semicolons or figure out a way to use layout even if that seems opinionated.

But – the fact that the "\n" seems to be getting translated into the same thing as "" i.e. some whitepsace separation required but can be any space and prints as a single space, rather than either:

…seems like a bug. Having the character I explicitly specified be accepted but treated as other characters, is definitely unexpected (or at least unintuitive) behavior.

andreasabel commented 1 year ago

There is work in progress by @beataburreau on a new implementation of BNFC where one has a newline special token.

praduca commented 11 months ago

I think there is some bug about using semicolons somewhere... I'm trying to make a tinybasic grammar, but when i use a semicolon as a separator (like "PRINT A$;B$" ) it parses ok but the prettyprinter put every part on a different line. Changing the separator to a comma works fine...

andreasabel commented 11 months ago

@praduca: There is same hard-writing in the render function of the generated printer biased towards "braces and semicolon" style languages. If you want some other rendering, you have to patch the generated printer.

praduca commented 11 months ago

Ah good to know it is something simple. Thanks for commenting so quickly