Permanent syntax tree representation in background

A consistent syntax tree representation of all element texts (or just the contained expressions) ought to be provided. This allows to concentrate on a clean and consistent syntax analysis, also in order to improve the performance. Parts of it are ready but need more elaboration. And it must be tested with regard to memory and time complexity. The repeated tokenizations and concatenations and the frantic search for a point where certain matching and replacements should ideally take place without spoiling all what had been transformed before or will have to be transformed thereafter could be avoided this way. If in the event there will be a central point in all generators where built-in functions are to be handled then this will be a big achievement allowing the requested clear documentation. (At least for a while...) Some of the benefits would be:

Built-in functions and procedures as well as operators can unambiguously be identified, which would improve Executor performance and generator functionality (e.g. addressing demands like in #237).
As the syntax trees are to be derived (or updated) only when an element is changed or on first Analyser inspection or first execution or code generation (in a lazy initialization approach), it is expected dramatically to reduce syntax analysis time (similar to the already achieved diagram drawing speed by permanently caching the highlighting patterns).
Instead of the current guesswork labyrinth there would be a structurally consistent and transparent operation.
It might be way easier to add new generators.

There are of course several challenges, too:

Nassi-Shneiderman diagrams are by design meant to be syntax-free, so there will always be texts not (or only partially) being convertible into syntax diagrams of some kind. (What to do here? Handle them as competely untranslatable text or try a partial analysis? Maintain the old complex and unstructured analysis approaches in parallel for such cases?).
Structorizer in particular supports several syntactical flavours, especially regarding explicit and implicit declarations - shall we reflect them in the syntax diagrams or canonicalize them on this occation (the latter might cause irritating Analyser reports, and Executor or generator error messages)?
How to deal with context influence on the type inference and so on (the declaration or initialization of variable already used in some existing element text might latter be inserted, which may e.g. reinterpret the meaning of some operator symbol etc.)? How to identify such impacts and their scope?
Is it better to represent an entire text line (including commands, separating keywords, and declaration stuff or just the expressions strewn into a line?

Originally posted by @codemanyak in https://github.com/fesch/Structorizer.Desktop/issues/462#issuecomment-343501610

This also relates to several internal issues.

Remark: Possibly it was not the best idea permanently to hold parsed lines on the elements, in particular since not all element text can be parsed and even small modifications in other elements or diagrams can invalidate any syntax tree derived on former diagram status. A very reasonable compromise seems to be to store the element text as lexically split token lists where the whitespace isn't mixed among the tokens but managed separately. This makes superfluous a lot of to-and-fro conversions, preserves original spacing without affecting token indices and it accelerates parsing a lot. On this occasion, the user-configurable "key phrases" (parser preferences) that may consist of several lexical tokens (like jusqu'à) can be represented by fix internal tokens, which make refactoring obsolete, since the internal key will always be the same, only on display and editing the user-specified keywords are to be shown. A parser preference modification will only require a drawing refresh (like with controller routine aliases). The task of Executor, Analyser and code generators will be facilitated a lot. They can make use of (ephemeral) syntax trees where it convenes. They may concentrate on the expressions embedded in the element text lines.

fesch / Structorizer.Desktop

Permanent syntax tree representation in background #800