dotnet / command-line-api

Command line parsing, invocation, and rendering of terminal output.
https://github.com/dotnet/command-line-api/wiki
MIT License
3.37k stars 378 forks source link

Clarify what data should be static, on pipeline or on subsystem #2463

Open KathleenDollard opened 1 month ago

KathleenDollard commented 1 month ago

Background

We are leaving the door open for multiple pipelines in simultaneous use (multiple declared and possibly running on different threads). This allows for a different set of subsystems for each scenario/pipeline.

We are using a single data storage location - a static strongly typed weak table - to support multiple threads.

The design demands that any particular CLI be created on a single thread and not modified once it is in use by multiple threads.

It does not currently block the same symbol from being used in multiple pipelines/threads.

Problem

Our discussion yesterday was that providers should be pipeline specific. This is problematic for two reasons:

Possible solutions

While we probably could work through these scenarios, there are at least two ways to avoid the problem:

Note on a later optimization for multiple pipelines:

If this scenario materializes, not supplying an optimization would result in all the descriptions for one pipeline being loaded because another pipeline needed its descriptions. This could be problematic.

A simple optimization would be to create a provider that knew the root of its associated CLI tree. This assumes that each pipeline is specific to a CLI tree, but I am OK with putting restrictions on such a niche scenario. Since each symbol can find its root during parsing but not necessarily before, this would also require handling of an unattached symbol, probably by just not running the provider.

My current opinion is that providers should be managed the same way we manage the underlying data, which is globally. If anything else at all needs to be moved globally, we should take a close look at design. But the relationship between providers and data storage is so intimate, I think we should have them similarly placed and static.

Which leads to the weird idea that providers themselves should be in the strongly typed weak table. That might require a common class for symbols and providers, which seems to be too weird an idea unless managing multiple strongly typed weak tables are difficult and it's not OK to just store providers, which are rare, in normal storage.

Looking forward to comments.

KathleenDollard commented 1 month ago

Storing providers with the annotations would probably allow us to block access to annotations, except through a pathway that always respected providers. I think this would be highly desirable.

mhutch commented 1 month ago

I don't hate the idea of providers being global but I do have a few thoughts/concerns.

It would certainly simplify the API so that subsystems don't have to use a different getter than authors. However, I think there may be value in having a distinction between "get the directly assigned annotation" and "get the resolved (direct or provider) annotation.

I don't think reuse of symbols between pipelines is a big concern... Firstly, having multiple pipelines and using providers are both somewhat advanced scenarios, and secondly, I don't think it's actually an issue if the same symbol is used in multiple pipelines and resolves different values for some annotation, as long as providers are clearly associated withpipelines. Devs can use the same provider in multiple pipelines if they want anyways.

Indeed, we may even want to consider altering the provider model to allow contextual resolution as a mechanism for customizing annotations (e g. Description) when a symbol appears multiple times in the same grammar - resolve the value of X annotation for Y symbol in the context of Z parent symbol and W pipeline.

My biggest concern, however, is that making providers global makes it difficult/impossible to scope them for GC/disposal.

KathleenDollard commented 1 month ago

I think that the ability to GC what providers create is essential, as that is how we expect people to handle large strings. I can see some providers that would have a small footprint, but some may not. And just in principle, we should strive to allow CLI authors to clean up prior to running high performance apps.

So, I agree, let's keep providers scoped to the pipeline. I remain somewhat concerned that if people get the value of a provider created pseudo property on a symbol from the symbol they will sometimes get a different value than they would get if they asked the pipeline. But I think your intuition that people will understand this if we make the API clear is correct. For example, if direct access on the symbol sometimes returns the value without the provider, it should never return the provider value. Of course, we can also add a pipeline feature that returns the value as used, particularly for debugging.