Global State Support for Semantic Analysis Framework

Background

The current design of the semantic analysis framework does not offer a unified approach to work with the external or global state of the Analyzer. For example, there is no dedicated API to enumerate or request specific documents or to fetch compiler-wide configurations from inside the computable functions (see prior discussion in #12).

In principle, such state could be treated as an external environment to the computable functions. For example, the mapping between the file names and the Analyzer's documents is inherently an external state. The programmer can define a lazy static variable with the external/global state, place it near the Analyzer's instance, and query this static variable inside attribute computables. Whenever the external configuration changes, the programmer would trigger a corresponding user-defined event to enforce computable function invalidation.

This approach provides a reasonable workaround to this problem, but it has a number of drawbacks:

This approach requires a sealed architecture of the end compiler, which is bad from a modularity and testability point of view. The programmer will not be able to instantiate more than one unique instance of the compiler.
Maintaining the system of user-defined events is error-prone.
Sometimes we need analyzer-wide attributes to perform common compiler computations that are not naturally bound to any compilation unit. For example, there could be an attribute that collects project-wide diagnostic messages across all documents. Such attributes cannot be instantiated inside the external state, because in the current design, the attribute object is an integral part of the syntax tree node semantics.

Proposal

To address the above issues, I propose introducing a new grammar-wide semantic configuration. This configuration will be specified through the #[semantics(MyLangSemantics)] macro attribute on the enum type that specifies the language grammar (similar to how we specify the node classifier: #[classifier(MyLangClassifier)]).

The "MyLangSemantics" object would be a user-defined semantic feature within which the programmer can define any compiler-wide attributes with computable functions and other semantic features. Essentially, within this semantic object, the programmer will organize the nested structure of the compiler-wide computations. The attributes of this "global" semantics will be an integral part of the semantic graph, available for normal querying within any other attribute (both node attributes and other "global" attributes) and outside of it.

Additionally, a new type of Feature will be introduced: the Slot object (name TBD). This object is similar to Attr, except it does not have a built-in computable function, and its value is intended to be part of the external state.

#[derive(Node)]
#[semantics(MyLangSemantics)] // When omitted, the global semantics is `VoidFeature`.
enum MyNode {
    // ...
}

#[derive(Feature)]
#[node(MyNode)]
struct MyLangSemantics {
    global_diagnostics: Attr<GlobalDiagnosticsAttr>,
    config_file: Slot<SomeConfigObject>,
    files_structure: Slot<HashMap<String, Id>>,
}

The value of the Slot can be accessed for reading inside and outside of attributes, just like any other attribute. When read inside an attribute's computable function, the attribute subscribes to changes in the Slot's value. Additionally, the value can be written within a MutationTask/ExclusiveTask (from which you also manage the Analyzer's documents).

Whenever a mutation task changes the Slot's value, all directly dependent attributes will be invalidated automatically. Therefore, no additional synchronization steps between the "external" state and the semantic graph are needed.

The type of the Slot's generic parameter is required to be Eq for the semantic graph validation procedure.

By default, the Slot will be instantiated with an "uninitialized" value. If the compiler's code attempts to read an uninitialized Slot, the reading function returns an UninitSlot error. Therefore, the programmer must ensure to initialize Slots manually whenever applicable (preferably right after the Analyzer instantiation).

Open Questions

The proposed design still requires manual initialization of the Slot values, which implies additional maintenance efforts not enforced by the Rust type checker. In my opinion, this should be manageable, but I would like to hear your opinions as well.

An alternative approach could be moving Slots into a dedicated interface, allowing the programmer to specify initial values of the external state through the Analyzer's constructor. However, this might introduce extra setup complexity.

Another alternative is to impose Default on the Slot's generic parameter, such that the Slot value will be initialized with the default value. This would eliminate the problem of uninitialized Slot errors, but does not inherently prevent setting up the Slot values. It also imposes additional restrictions on the Slot type.
Introducing the #[semantics(...)] macro attribute requires breaking changes in the Grammar trait (I would need to introduce a new associated type in the trait). This formally violates Semver-policy if we want this feature in the 2.* version of Lady Deirdre. However, I assume that no one is using this low-level trait manually yet, so I will make an exception. To address this issue, in the next minor release, I will include a comment in the trait's documentation indicating that this trait is not sealed and is not stabilized.

Implementation Steps

[x] Add a note in the Grammar trait's documentation indicating that the trait's API is unstable in the next minor release.
[x] Implement the proposed changes in a dedicated branch available for public review.
[x] (Optional) Update the User Guide.
[x] (Optional) Provide corresponding examples.

Eliah-Lakhin / lady-deirdre