Closed drexlerd closed 9 months ago
It seems reasonable to split the syntax parser in three stages:
Stage 1: Parse basic structure consisting of opening and closing parenthesis, and strings. Stage 2: Parse keyword structure Stage 3: Parse final structure
The benefit of this approach is that while the current single stage syntax parser might reject a input file now, it might still pass some of the stages above where we can collect some structural information that we can use to report better errors.
This is too complicated and semantic actions seem to be the preferred way.
My feeling also. Keeping it simple is often the best choice :)
This got way to complicated for very little gain. I had to define a more general word
that captures name
, variable
, and possible missspellings of names and variables, e.g., starting with numbers that I would then test with a semantic action. I had to be very careful to not consume -
which is used to define types. The resulting grammar rule was complicated, and semantic actions are also supposed to be avoided since they make the grammar additionally more complex. I think trying further refining the grammar to catch such errors is a dead-end and should not be further be looked into.
not completed
The problem is the following. A
ast::Name
is a word over alpha numerical characters starting with a alphabetical character and aast::Variable
isast::Name
with prefix?
. Clearly, both are syntactically different. The syntactic parser will just fail at a position where it expects aast::Name
but aast::Variable
was given. Currently the error might be some thing like:Expected ')'
, which I find is a bit unsatisfying. If we would instead parse a nodeast::NameOrVariable
represented as an alternative and get a successful parse, we could decide to throw a more meaningful error in a later parsing step (syntactic or semantic parsing).I think this should be the responsibility of the syntactic parsing: Ideally, we would like a boost syntax parser with only expectation operations (
>
) because they allow us to report a meaningful errors. When an expectation operation fails, an exception is thrown with a meaningful error message and position in the input stream depending on the parser that failed, e.g., "Expected a variable or a name". This requires splitting the syntax parser into an iterative process. Important question: can we parse only the most basic structure first, e.g., for each opening parenthesis there must follow zero or more characters, must follow closing parenthesis, i.e.,node_def = lit('(') > *text_or_node() > lit(')')
,text_or_node_def = (*char_ | node)
. I think the answer is yes. Furthermore, can we then refine the AST that we get to obtain a more concrete AST with same idea of using expectation operations only? Probably yes, but not so sure how easy that is because when the nodes are more abstract, they may contain many alternatives (|
) and therefore, a resulting blowup in the number of AST nodes and parsers.This will require a lot of rework in the syntactic parser but should be fully separate from the semantic parsing that we currently got since the resulting AST will be the exact same as the AST that we currently have. However, it is important to ensure that this will not cause changes on other sides of the code since this is highly relevant for the goal of the project.