Eliah-Lakhin / lady-deirdre

Compiler front-end foundation technology.
http://lady-deirdre.lakhin.com/
446 stars 13 forks source link

Global State Support for Semantic Analysis Framework #15

Closed Eliah-Lakhin closed 2 months ago

Eliah-Lakhin commented 5 months ago

Background

The current design of the semantic analysis framework does not offer a unified approach to work with the external or global state of the Analyzer. For example, there is no dedicated API to enumerate or request specific documents or to fetch compiler-wide configurations from inside the computable functions (see prior discussion in #12).

In principle, such state could be treated as an external environment to the computable functions. For example, the mapping between the file names and the Analyzer's documents is inherently an external state. The programmer can define a lazy static variable with the external/global state, place it near the Analyzer's instance, and query this static variable inside attribute computables. Whenever the external configuration changes, the programmer would trigger a corresponding user-defined event to enforce computable function invalidation.

This approach provides a reasonable workaround to this problem, but it has a number of drawbacks:

  1. This approach requires a sealed architecture of the end compiler, which is bad from a modularity and testability point of view. The programmer will not be able to instantiate more than one unique instance of the compiler.

  2. Maintaining the system of user-defined events is error-prone.

  3. Sometimes we need analyzer-wide attributes to perform common compiler computations that are not naturally bound to any compilation unit. For example, there could be an attribute that collects project-wide diagnostic messages across all documents. Such attributes cannot be instantiated inside the external state, because in the current design, the attribute object is an integral part of the syntax tree node semantics.

Proposal

To address the above issues, I propose introducing a new grammar-wide semantic configuration. This configuration will be specified through the #[semantics(MyLangSemantics)] macro attribute on the enum type that specifies the language grammar (similar to how we specify the node classifier: #[classifier(MyLangClassifier)]).

The "MyLangSemantics" object would be a user-defined semantic feature within which the programmer can define any compiler-wide attributes with computable functions and other semantic features. Essentially, within this semantic object, the programmer will organize the nested structure of the compiler-wide computations. The attributes of this "global" semantics will be an integral part of the semantic graph, available for normal querying within any other attribute (both node attributes and other "global" attributes) and outside of it.

Additionally, a new type of Feature will be introduced: the Slot object (name TBD). This object is similar to Attr, except it does not have a built-in computable function, and its value is intended to be part of the external state.

#[derive(Node)]
#[semantics(MyLangSemantics)] // When omitted, the global semantics is `VoidFeature`.
enum MyNode {
    // ...
}

#[derive(Feature)]
#[node(MyNode)]
struct MyLangSemantics {
    global_diagnostics: Attr<GlobalDiagnosticsAttr>,
    config_file: Slot<SomeConfigObject>,
    files_structure: Slot<HashMap<String, Id>>,
}

The value of the Slot can be accessed for reading inside and outside of attributes, just like any other attribute. When read inside an attribute's computable function, the attribute subscribes to changes in the Slot's value. Additionally, the value can be written within a MutationTask/ExclusiveTask (from which you also manage the Analyzer's documents).

Whenever a mutation task changes the Slot's value, all directly dependent attributes will be invalidated automatically. Therefore, no additional synchronization steps between the "external" state and the semantic graph are needed.

The type of the Slot's generic parameter is required to be Eq for the semantic graph validation procedure.

By default, the Slot will be instantiated with an "uninitialized" value. If the compiler's code attempts to read an uninitialized Slot, the reading function returns an UninitSlot error. Therefore, the programmer must ensure to initialize Slots manually whenever applicable (preferably right after the Analyzer instantiation).

Open Questions

  1. The proposed design still requires manual initialization of the Slot values, which implies additional maintenance efforts not enforced by the Rust type checker. In my opinion, this should be manageable, but I would like to hear your opinions as well.

    An alternative approach could be moving Slots into a dedicated interface, allowing the programmer to specify initial values of the external state through the Analyzer's constructor. However, this might introduce extra setup complexity.

    Another alternative is to impose Default on the Slot's generic parameter, such that the Slot value will be initialized with the default value. This would eliminate the problem of uninitialized Slot errors, but does not inherently prevent setting up the Slot values. It also imposes additional restrictions on the Slot type.

  2. Introducing the #[semantics(...)] macro attribute requires breaking changes in the Grammar trait (I would need to introduce a new associated type in the trait). This formally violates Semver-policy if we want this feature in the 2.* version of Lady Deirdre. However, I assume that no one is using this low-level trait manually yet, so I will make an exception. To address this issue, in the next minor release, I will include a comment in the trait's documentation indicating that this trait is not sealed and is not stabilized.

Implementation Steps

  1. [x] Add a note in the Grammar trait's documentation indicating that the trait's API is unstable in the next minor release.
  2. [x] Implement the proposed changes in a dedicated branch available for public review.
  3. [x] (Optional) Update the User Guide.
  4. [x] (Optional) Provide corresponding examples.
Eliah-Lakhin commented 2 months ago

The final decision is that the Slot will hold a value of a type that has a Default implementation, and that the Eq implementation is not required.

Slot values are instantiated with default values from the beginning. If the underlying type requires a non-standard constructor, the type can be wrapped in an Option. I believe this approach will be more ergonomic in practice, as the intended use case for slots is to store common configurations, such as the mapping between file names and their document IDs (HashMap). In most cases, they will be initialized with reasonable defaults during the Analyzer's creation.

I also removed the Eq requirement because testing for slot value equality could be inefficient in most situations. Instead, the API user will return a boolean flag indicating whether the value has been modified after the slot value mutation.

#[derive(Feature)]
#[node(SharedSemanticsNode)]
pub struct CommonSemantics {
    pub modules: Slot<SharedSemanticsNode, HashMap<String, Id>>,
}

// Mutating the slot's value:

task.common()
    .modules // This is a Slot, which is part of the common Analyzer's semantics.
    .mutate(&task, |modules| {
        let _ = modules.insert(String::from("module_1"), doc_id);

        true // Indicates that the Slot's value has been mutated.
    })
    .unwrap();

// Reading the slot's value from inside an attribute function:

let modules = context.common().modules.read(context).unwrap_abnormal()?;

Additionally, I had to introduce breaking changes to the Feature and AbstractFeature traits by adding new members. These traits are low-level, and from now on they will be considered unstable with respect to adding new members.


The documentation has been updated accordingly: