dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.96k stars 4.03k forks source link

API to determine if a syntax tree contains *conditional* directives. #65792

Closed CyrusNajmabadi closed 1 year ago

CyrusNajmabadi commented 1 year ago

PR demonstrating idea: https://github.com/dotnet/roslyn/pull/65794

Background and Motivation

Roslyn IDE is introducing a change whereby we share trees when applicable for a particular file linked into multiple projects (which also occurs for projects that have multiple TFMs). This change has substantial savings in practice. For example, in roslyn itself, 25k of our 29k documents in memory are linked into multiple projects and thus could potentially be eligible for tree sharing.

In that 25k potentially sharable files, 11k are actually distinct real files on disk, and 14k are those files linked into other projects (the majority is just linked into one additional project, but we have a few that are linked into multiple other projects).

Now, not all trees can be shared. A good example of that are trees that contain different #if directives. These trees might actually parse differently in different projects, and thus you could get the wrong tree in their respective projects. For example, if you had code that did #if NETCORE6 but you were currently in teh NETSTANDARD2 flavor of that project.

In the IDE we can trivially see if a tree contains directives or not using the SyntaxNode.ContainsDirective property. If the file doesn't have directives, then we can definitely share them. With this flag in roslyn, we can see that of those 14k files that we could potentially share, 8.5k of them dont' have directives at all and thus can be shared. But that still leaves 5.5k files that we don't share because they have some directives in them.

However, examining these 5.5k files, we see that 3.7k of those have directives but do not have #if directives. In other words, they contain things like #region or #nullable directives. i.e. directive that don't change parsing. This is a full 67% of teh files with directives, and 40% of the total potential sharable files.

So, if we could get an easy api to determine if a tree contains conditional directives, we could cheaply determine if we could reuse those 3.7 thousand trees or not.

In practice, without doing any sharing, roslyn takes around 2GB of ram to parse all its files. With sharing what we can (without this api) we get down to around 1.2GB or ram used). With this flag, we lower it an additional 200MB to around 1GB. So this flag helps us get to 50% memory usage easily.

Proposed API

namespace Microsoft.CodeAnalysis
{
     public class SyntaxNode
     {
           /// Returns `true` if this node contains a conditional directive (`#if` in C# or `#If` in VisualBasic) within it.
+        public bool ContainsConditionalDirective();
     }

Alternative Designs

Technically this API is not necessary as it can be computed by callers. For example, the IDE code can walk down the tree, using the existing .ContainsDiagnostic bit, and then find the tokens of interest and check them for the specific types of trivia on them. However, when trying this out, this was not cheap as it forced realization of red nodes to do this work. This caused lots of allocations in all the files despite not needing/wanting them. This ended up undoing a ton of the benefits of the sharing of those files.

The above API directly on the compiler nodes allows for the impl to be entirely green-node based and blazingly fast, without allocs.

--

Also, i explored having this be a bit on the node directly (like ContainsDirectives). However, our NODEFLAGS enum is totally full now. So if i added this we'd either go to 65bits from 64bit (so larger than a long), or i'd have to mix this into some other part of the node (like using a bit on the length). That all seemed scary and risky, so i made thsi a computed method.

Risks

None.

333fred commented 1 year ago

API Review

tmat commented 1 year ago

@CyrusNajmabadi Would checking for specific conditional symbols help? For example, if the directives only check for DEBUG we can also share the trees as all projects have the same configuration. Though, maybe that adds complexity that's not worth the extra savings.

tmat commented 1 year ago

Perhaps have an API that returns all symbols used in conditional directives would help?

CyrusNajmabadi commented 1 year ago

From testing the amount of trees this narrows things down to means we can just walk those ourselves on the red side and have it be nbd