Updated Subsystem data storage

We had a great design meeting yesterday which resulted in additional changes to the way subsystems hold data. This issue includes those changes and things I felt were implied by them or the needs of validation/completion. This issue supersedes #2458.

Problems to solve

The initial design assumed that subsystems using other subsystems would be somewhat rare and primarily for a small number of data points, like description. There is a link between completions and validations that is inconsistent with that assumption. Most completions correspond to validation, although some validation does not have a corresponding completion. Also, validation, and possibly other subsystems define characteristics of the symbol that some help systems may wish to display.

Work to do

Annotations are the identifier for pieces of data, and that word is used casually here instead of data point.

We had already moved the data storage for subsystems from subsystems to the pipeline to avoid loading subsystems just to retrieve a piece of data. This issue suggests further improvements:

Remove the relationship between annotations and subsystems. All data belongs to the pipeline and not a particular subsystem.
~~Add traits as a set of annotations with an identifier.~~ What is referred to in this issue as a trait is just a general annotation, per the previous bullet ~~- Consider an alternative approach to annotation providers:~~ ~~- Two default kinds: Lazy and lazy dictionary.~~ ~~- Simpler definition.~~ Annotations will remain just values - no recommended change.
Determine desired layout and update existing annotations and extensions.
- We have a bunch of files under Subsystem/Annotations. Do we need them?
Review the big comment in AnnotationStorageExtensions.cs and move to an issue until incorporated in a proposal, making it easier to find.
Consider #2463 on provider storage location
- Also, consider whether we can block direct access to the strongly typed weak table, using an access method that forces respecting providers.

Remove relationship between annotations and subsystems

For example, the annotation id for Description changes from Help.Description to just Description.

If two subsystems use the same annotation id, the first subsystem on which a value is set will determine the type of the annotation. If a value is later set that is not implicitly (Q: Should this be or explicitly) convertible to that type, an error will occur at runtime.

This does mean that subsystem extensions will collide. This is now by design. The user sees a single thing and all subsystems should use it in the same way. While it is super easy to imagine collisions, we are choosing the path that people will generally work to think like users, and generally that will result in the same name and type.

We think the ecosystem will work this out. We may offer more data annotations than we use to lay the groundwork for common naming.

Traits

Sometimes the user will want to identify an aspect of a symbol and just have the right thing happen. In the current main version, all options have a FileExists property, for example. This indicates both validation and completion actions, and could also spawn a note in help.

Trait is proposed to solve this in Powderhouse. At core, this is just a set of annotations that are defined as a unit by the CLI author. Generic extension methods (extension properties in the future) allow easy entry by the CLI author:

var opt1 = new CliOption<int>("one");
var opt2 = new CliOption<FileInfo>("two");

var opt1.SetRange(1,4); // .FileExists is an error and not displayed in IntelliSense
var opt2.SetAsFileMustExist(); // .Range is an error and not displayed in IntelliSense

Each of these traits result in the addition of a validation and completion annotation lazily via a provider.

Messages for failure and help are included in the validation annotation.

Alternative approach to annotation providers

This proposal captures our increased understanding of common cases for providers. The underlying model might not change as a result.

Providers must be reentrant, which appears to result in two kinds of providers for currently known scenarios:

Lazy methods to produce single values per symbol:
- Generally defined by CLI author.
- Can only be run once.
Collections to support traits:
- Generally defined by traits.
- Need to not create an order dependency.

Both of these provider types are general enough I think we should provide them.

Lazy providers

The simplest way for the CLI author to create lazy annotations for something like Description is just to write a method that defines them. That implies a delegate and all additional ways that the user might get data can be defined as a delegate:

IEnumerable<(CliSymbol, string)> GetDescriptions()
{ // set some descriptions
}

pipeline.AddProvider([DescriptionAnnotation], GetDescriptions);

We can either change the provider model to be a collection of <IEnumerable<(IEunumerable<Annotation>), Func<(CliSymbol, string)>> and have the infrastructure determine whether something should be called, or there could be an aggregate provider that works with the current interface and keeps a dictionary of IEunumerable<Annotation>), Func<(CliSymbol, string)>' or spread into a dictionary ofAnnotation, Func<(CliSymbol, string)>'.

The infrastructure or aggregate provider must ensure each delegate is called no more than once - for example setting the Func to null after the first run. It should run when the first request to retrieve a value is made to it.

Collection providers

The main use case for collection providers is traits. The validation trait will issue a call something like:

var validationProvider = pipeine.AddProvider(new AggregateCollectionProvider<Validation>(ValidationAnnotation);

public static void SetRange<T>(this CliSymbol symbol, T lowerBound, T upperBound
  where T: IComparable
{
   validationProvider.Add(new RangeValidation(lowerBound, upperBound);
}

Lazy providers do not work because they are run only once. To avoid an order dependency issue when traits are added after a value after a request to retrieve data, reentrancy probably means allowing the provider to execute itself many times, and empty an internal collection after each run.

dotnet / command-line-api