Iterative syntax parsing to produce more meaningful errors?

drexlerd commented 9 months ago

The problem is the following. A ast::Name is a word over alpha numerical characters starting with a alphabetical character and a ast::Variable is ast::Name with prefix ?. Clearly, both are syntactically different. The syntactic parser will just fail at a position where it expects a ast::Name but a ast::Variable was given. Currently the error might be some thing like: Expected ')', which I find is a bit unsatisfying. If we would instead parse a node ast::NameOrVariable represented as an alternative and get a successful parse, we could decide to throw a more meaningful error in a later parsing step (syntactic or semantic parsing).

I think this should be the responsibility of the syntactic parsing: Ideally, we would like a boost syntax parser with only expectation operations (>) because they allow us to report a meaningful errors. When an expectation operation fails, an exception is thrown with a meaningful error message and position in the input stream depending on the parser that failed, e.g., "Expected a variable or a name". This requires splitting the syntax parser into an iterative process. Important question: can we parse only the most basic structure first, e.g., for each opening parenthesis there must follow zero or more characters, must follow closing parenthesis, i.e., node_def = lit('(') > *text_or_node() > lit(')'), text_or_node_def = (*char_ | node). I think the answer is yes. Furthermore, can we then refine the AST that we get to obtain a more concrete AST with same idea of using expectation operations only? Probably yes, but not so sure how easy that is because when the nodes are more abstract, they may contain many alternatives (|) and therefore, a resulting blowup in the number of AST nodes and parsers.

This will require a lot of rework in the syntactic parser but should be fully separate from the semantic parsing that we currently got since the resulting AST will be the exact same as the AST that we currently have. However, it is important to ensure that this will not cause changes on other sides of the code since this is highly relevant for the goal of the project.

drexlerd commented 9 months ago

It seems reasonable to split the syntax parser in three stages:

Stage 1: Parse basic structure consisting of opening and closing parenthesis, and strings. Stage 2: Parse keyword structure Stage 3: Parse final structure

The benefit of this approach is that while the current single stage syntax parser might reject a input file now, it might still pass some of the stages above where we can collect some structural information that we can use to report better errors.

drexlerd commented 9 months ago

This is too complicated and semantic actions seem to be the preferred way.

jendrikseipp commented 9 months ago

My feeling also. Keeping it simple is often the best choice :)

drexlerd commented 9 months ago

This got way to complicated for very little gain. I had to define a more general word that captures name, variable, and possible missspellings of names and variables, e.g., starting with numbers that I would then test with a semantic action. I had to be very careful to not consume - which is used to define types. The resulting grammar rule was complicated, and semantic actions are also supposed to be avoided since they make the grammar additionally more complex. I think trying further refining the grammar to catch such errors is a dead-end and should not be further be looked into.

drexlerd commented 9 months ago

not completed

drexlerd / Loki

Iterative syntax parsing to produce more meaningful errors? #8