EQL: Incorporation of Stage 0 transformation into fluent API implementation

homedirectory commented 6 months ago

Description

Compilation of EQL expressions involves several phases of analysis and transformation. The focus of this issue is on the first 2 phases: tokenisation and Stage 0 transformation.

The result of method calls on EQL's fluent API is represented as a sequence of tokens corresponding to the called methods, which may also be seen as a sequence of transitions between states of an automaton manifested in the form of the fluent API. This phase corresponds to the operation of a lexer -- lexical analysis.
The tokens are then subject to an analysis (namely, Stage 0 transformation) that groups them according to the rules of EQL's grammar, thereby building an AST for the EQL expression. This phase corresponds to syntactic analysis, to some extent, since it makes use of semantic analysis as well.

The following drawbacks have been identified in the course of work with this approach:

The tokenisation phase is not strictly necessary, since method calls on the fluent API (state transitions) perform the role of a lexer implicitly: the called method directly corresponds to the token. In other words, we get tokens for free.
As a consequence of the first issue, existence of a separate phase dedicated to syntactic analysis (Phase 2) leads to duplication of the EQL's grammar rules which are already encoded in the automaton that is the fluent API. This duplication occcurs since Phase 2 needs to know how to build an AST out of incoming tokens.
Separate existence of two phases, coupled with duplication of grammar rules, requires strict synchronisation between them which has its own disadvantages. One example is the current defect involving caseWhen with multiple conditions: it is accepted by the fluent API but rejected during Stage 0 transformation (Phase 2).

It is proposed to incorporate Phase 2 into Phase 1, effectively moving syntactic analysis into the implementation of the fluent API. The result of method calls on the fluent API would then be an AST for the EQL expression. Such an implementation would effectively act as an LR parser (with superpowers, due to the ability of semantic analysis).

Expected outcome

Simplified implementation of Stage 0 transformation due to elimination of grammar rules duplication and the need for synchronisation.
Elimination of potential errors that could arise because of lack of synchronisation between phases.

homedirectory commented 6 months ago

After further discussion and an experimental spike it was concluded that the duplication of grammar rules is inherent to any parsing process. At the moment the only general approach seems to be the usage of a generator for the fluent API and the parser so that grammar rules have to be defined only once.

It was decided that it is better to focus on the construction of a formal grammar for EQL and improvements to the existing parser, until the possibility of fluent API and parser generation is better understood.

01es commented 6 months ago

@homedirectory, can the problems in the existing parser implementation that require improvements be outline?

homedirectory commented 6 months ago

@01es, yes, and they should be captured in a separate issue. Among them:

The defect with caseWhen used in conjunction with a combination of conditions.
Insufficient separation of representations of token categories (the abstract syntax tree) and encountered values (the concrete syntax tree), which complicates informal reasoning.

fieldenms / tg

EQL: Incorporation of Stage 0 transformation into fluent API implementation #2173

Description

Expected outcome