Closed Xanewok closed 3 months ago
While implementing #1003 I started thinking whether using v2 types only makes sense for our existing parser generator.
The goal of #1003 and this issue in general was to reduce duplication between the very similar types and get rid of the bridging types whenever possible.
While it was enabled by the mechanical and lexer-oriented design of model::Scanner
, I'm worried that it might not be possible to do the same with ParserDefinitionNode
because of the higher level of abstraction of the v2 types for structs, choices and precedence operators.
The are a couple of points here:
Having the above in mind, while there are some unnecessary translation hacks still that I'd like to smooth out, I would probably reconsider dropping the v1 interface entirely.
cc @AntonyBlakey @OmarTawfik for visibility
@OmarTawfik from what I understand:
If not, then, I'd propose that we treat it as general improvement to the trivia handling rather than being a core component of the DSL v2 (we did already migrate the parser to the DSL v2 and we're happy with the current behavior) and we should probably track it separately if we're not already happy with what we have, as per above.
The listed scope for this issue almost feels like a quarterly KR and I'd love if we could finally close it, once #1018 is merged (and we revert the string rule) :sweat_smile:
we don't need the end of input token (first item)
sounds good.
is somewhat subsumed by us trying to always parse the trivia after the end of the production.
IIRC, the hack we added is to "reuse" leading trivia as EOF trivia, and the proper fix was to construct a specific EOF parser in codegen, with its definition being Choice([all trivia items in the grammar])
and run it after each successful parse. This allows LeadingTrivia
definition to remain true/accurate to DSL v2, and remove the TODO(#638)
comments in the definition.
Is that still the case? If so, it is definitely not blocking, and I would not prioritize it over other important tasks we have, but I think we should still do it. If you have a full plate, please feel free to put it in a separate issue and assign it to me, and we can close this one.
I remember that we did discuss what to do with trailing trivia and that it's not so obvious, as for example
// ...snip
}
/* some multi
line comment */
would mean that this conceptually would belong to the }
token and this prompts the question of how to format it and adds inconsistency as our trailing trivia in general cannot span multiple lines.
The way we have this setup is that the trailing trivia has to be a conceptually a subset of the leading trivia and thus the leading trivia is the trivia parser, which means if we want to accept all of the trivia items at EOF, then the leading trivia definition is correct for that use case and not a stretch, from what I understand (but, to be clear, we do have a "hack" of implicitly parsing the leading trivia rather than it having a dedicated definition in the definition.rs).
If we want to use the DSLv2's definition of (as added by this PR)
ZeroOrMore(Choice([
Trivia(Whitespace),
Trivia(SingleLineComment),
Trivia(MultilineComment)
])),
Choice([Trivia(EndOfLine), EndOfInput])
then for it to be correct, we'd have to change it to something like:
trailing_trivia = Choice([
// A single line comment
Sequence([
Optional(Trivia(Whitespace)),
Optional(Choice([
Trivia(SingleLineComment),
Trivia(MultilineCommentThatDoesNotContainAnEndOfLine)
])),
Trivia(EndOfLine),
])
// De facto end-of-file trivia
// (separated in order to disallow multi-line trivias as trailing in the general case)
Sequence([
ZeroOrMore(Choice([
/* trivias listed in leading_trivia */
])),
Trivia(EndOfFile),
])
]),
At which point we're de facto reintroducing end_of_file_trivia, now that I think about it. Given that it's different enough from the regular trailing trivia, should we just bite the bullet and be more explicit about it to not confuse the user and be clear about expectations?
Introducing a dedicated notion of end_of_file trivia would also mean we don't need to introduce this virtual EndOfFile
trivia/token, which we all agreed at some point to not have magical/synthesized token in the definition that can't be directly parsed.
@AntonyBlakey @OmarTawfik I don't think we did write down our different conclusions and rationale while we were discussing this on multiple occasions. If you're fine with it, I'd propose that we separate an issue/thread dedicated to it as I feel we had already too many items bundled in this issue and the decision is not clear-cut at this moment.
It seems that "Revert https://github.com/NomicFoundation/slang/pull/646" was already done in https://github.com/NomicFoundation/slang/pull/800.
I've opened #1020 to track this.
If you're fine with it, I'd propose that we separate an issue/thread dedicated to it as I feel we had already too many items bundled in this issue and the decision is not clear-cut at this moment.
Sounds good. Let's follow up on #1020. Thanks!
At the moment we use our Rust macros 1.0 (aka DSL v1) to define the language for our parser. We already have:
So to cut down the different definitions, we should migrate to the newest definition (see https://github.com/NomicFoundation/slang/pull/617).
Migration PR:
Impacts the AST:
PrecedenceExpression#rule_name
and adapt the codegen model to fit the v2 definition betterPrecedence
items should also produce their ownNonTerminalKind
kind, wrapping the single child (PrecedenceExpression
orPrimaryExpression
).PrecedenceExpression
should get their ownparse_XXX()
function.Does not impact the AST:
ZeroOrMore(AnyTriviaItem)
at the end when parsing any production withpub fn parse
Scanner::not_followed_by
completely and automatically calculate it instead (context).codegen_grammar
crate. (https://github.com/NomicFoundation/slang/pull/972)