Open mingodad opened 1 year ago
Trying to create an LL(1)
parser with https://mingodad.github.io/CocoR-Typescript (and family) I found that this definition is problematic:
sized_type_literal [iuf][1-9][0-9]*
It's ambiguous with this:
identifier [A-Za-z_][A-Za-z0-9_]*
So when we see something like shown bellow, both rules apply and without context sensitive parsing/lexing how to know if they are identifiers or sized_type_literals ?
i3 || u4 || f5
I would prefer to have it postfixed like:
sized_type_literal [1-9][0-9]* [iuf]
Also there isn't any definition for floating point
numbers .
The Bison grammar probably isn't going to be very useful as a source-of-truth, because some of our lexing and parsing rules are pretty awkward to express in Bison, especially around the partial precedence order and the way we disambiguate *
. For the same reason, I'm not sure how useful railroad diagrams will be as documentation (but I don't know much about them).
As for the issue with the grammar not being LL(1) because of sized_type_literal
, I think you should open a separate issue for that, because it's an issue with the language itself, unrelated to railroad diagrams.
@geoffromer Is the aim for the grammar to be LL(1)? I can see reference to "finite bounded lookahead" and "context free", i.e. wanting some flavour of LL(k) / LR(k) (and associated families) parsing, but no explicit requirement of LL(1). It would be useful to spell that out expliticly if it is a design goal / restriction.
@geoffromer Is the aim for the grammar to be LL(1)? I can see reference to "finite bounded lookahead" and "context free", i.e. wanting some flavour of LL(k) / LR(k) (and associated families) parsing, but no explicit requirement of LL(1). It would be useful to spell that out expliticly if it is a design goal / restriction.
Moving reply from Discord for the record, from @chandlerc:
"You can't really parse infix binary operators with an LL grammar, it needs to be LR. We'd like to be LALR(1), and would be very reluctant to give up on LALR(k)."
It was an experiment because I was thinking that Carbon
would try to cut as much ambiguity as possible and maybe even be able to be parsed by LL(1) parser but it seems that's not the case.
Anyway thank you for all you help/feedback !
We can still generate RR diagrams for the LR(1) grammar? Is this issue still useful?
I think so, to have a clean global view of the grammar without the code.
FWIW I think they are useful, and as the aim is for the language to have a formal specification, constructing it could end up just being part of the CI process.
There's also an argument based on inclusiveness which chimes with Carbon's goals - some people find reasoning about visual representations much easier than more abstract textual ones.
And don't forget that the railroad diagrams generated by https://www.bottlecaps.de/rr/ui are navigable and cross referenced.
I have updated the EBNF
to be viewed at https://www.bottlecaps.de/rr/ui with the grammar on 2022-10-09.
I came across this thread wondering where/how Carbon grammar was defined. Would not it be a good idea to use ANTLR lexer/parser generator? As far as I can tell, it is the best on this planet.
The Explorer is currently using Flex and Bison (see lexer.lpp
and parser.ypp
), I think mostly because they're widely available. I don't think anybody's looked much into the costs and benefits of switching to ANTLR.
The production compiler is using a hand-rolled lexer and parser. I believe that's in order to maximize efficiency and diagnostic quality, but I wasn't closely involved in those discussions, so I may not have that right.
@geoffromer Thanks, I guess, I was just curious. I believe there were important reasons why classic Flex/Bison was chosen over way more modern and highly maintained ANTLR, but I do not see in what way Flex/Bison is more available. I agree, at this early stage of the project changing lexer/parser may be pointless and seem relatively expensive. I would just keep ANTLR on the list of alternatives in case one day Flex/Bison turns out to be too limited and.. it is still not too late. Here is a good discussion of this subject: Why you should not use (f)lex, yacc and bison
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive
label. The long term
label can also be added for issues which are expected to take time.
This issue is labeled inactive
because the last activity was over 90 days ago.
I'm also working to achieve a LALR(1)/LEX to try grammars online with wasm based on https://github.com/BenHanson/gram_grep and I've got the carbon-lang
grammar working, view it here https://mingodad.github.io/parsertl-playground/playground/ select Carbon parser (need review of *)
from the examples, you can edit the Grammar
or the Input source
and press Parse
to see a parser tree.
I hope it can be a nice tool to experiment with LALR(1)/LEX grammars with instant feedback !
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive
label. The long term
label can also be added for issues which are expected to take time.
This issue is labeled inactive
because the last activity was over 90 days ago.
Obs: Updated with the grammar on 2022-10-09
Using this fork of bison (https://github.com/mingodad/lalr-parser-test) to generate an
EBNF
understood by https://www.bottlecaps.de/rr/ui and manually adding the tokens fromlexer.lpp
we can have a navigable railroad diagram (https://en.wikipedia.org/wiki/Syntax_diagram) to help develop/debug/document the language.Copy and paste the
EBNF
shown bellow on https://www.bottlecaps.de/rr/ui in theEdit Grammar
tab then click theView Diagram
tab to get a navigable railroad diagram.