dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.85k stars 4.01k forks source link

Add CSharpSyntaxVisitor<TArgument, TResult> and CSharpSyntaxRewriter<TArgument> #62647

Open lameox opened 2 years ago

lameox commented 2 years ago

Background and Motivation

Currently we support a SymbolVisitor<TArgument, TResult> (introduced in #56530) as well as a OperationVisitor<TArgument, TResult>. It would be ideal to introduce a CSharpSyntaxVisitor<TArgument, TResult> for the same reasons outlined in #55183: It would allow us to reuse the same visitor and allow parallel invocations on it.

Additionally there should be merit in exposing the Argument on the SyntaxRewriter as well. This would, at the very least, allow us to rewrite the visitors used in the compiler itself which back the following operations:

By passing the arguments to the replacers around as structs we should be able to make all these APIs completely allocation-free which should be desirable.

Proposed API

namespace Microsoft.CodeAnalysis.CSharp
{
    public abstract partial class CSharpSyntaxNode : SyntaxNode, IFormattable
    {
        //...

+       public abstract TResult Accept<TArgument, TResult>(CSharpSyntaxVisitor<TArgument, TResult> visitor, TArgument argument);

        public abstract TResult? Accept<TResult>(CSharpSyntaxVisitor<TResult> visitor);

        public abstract void Accept(CSharpSyntaxVisitor visitor);

        //...
    }
}
namespace Microsoft.CodeAnalysis.CSharp
{    
+    public abstract partial class CSharpSyntaxVisitor<TArgument, TResult>
+    {
+        public virtual TResult Visit(SyntaxNode? node, TArgument argument)
+        {
+           if (node != null)
+           {
+                return ((CSharpSyntaxNode)node).Accept(this, argument);
+            }
+
+            return default;
+        }
+
+        public virtual TResult DefaultVisit(SyntaxNode node, TArgument argument)
+        {
+            return default;
+        }
+
+        //Additional generated code in Syntax.xml.Main.Generated.cs
+    }
}
namespace Microsoft.CodeAnalysis.CSharp
{    
+    public abstract partial class CSharpSyntaxRewriter<TArgument> : CSharpSyntaxVisitor<TArgument, SyntaxNode?>
+    {
+        //same API as current CSharpSyntaxRewriter, just accounting for the argument
+
+        public CSharpSyntaxRewriter(bool visitIntoStructuredTrivia = false)
+        {
+            _visitIntoStructuredTrivia = visitIntoStructuredTrivia;
+        }
+
+        public virtual bool VisitIntoStructuredTrivia { get; }
+
+        public override SyntaxNode? Visit(SyntaxNode? node, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxToken VisitToken(SyntaxToken token, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxTrivia VisitTrivia(SyntaxToken token, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxList<TNode> VisitList<TNode>(SyntaxList<TNode> list, TArgument argument) where TNode : SyntaxNode
+        {
+            //...
+        }
+
+        public virtual TNode? VisitListElement<TNode>(TNode? node, TArgument argument) where TNode : SyntaxNode
+        {
+            //...
+        }
+
+        public virtual SeparatedSyntaxList<TNode> VisitList<TNode>(SeparatedSyntaxList<TNode> list, TArgument argument) where TNode : SyntaxNode
+        {
+            //...
+        }
+
+        public virtual TNode? SyntaxToken VisitListSeparator(SyntaxToken separator, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxTokenList VisitList(SyntaxTokenList list, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxTriviaList VisitList(SyntaxTriviaList list, TArgument argument)
+        {
+            //...
+        }
+
+        public virtual SyntaxTrivia VisitListElement(SyntaxTrivia list, TArgument argument)
+        {
+            //...
+        }
+
+        //Additional generated code in Syntax.xml.Main.Generated.cs
+    }
}

Usage Examples

s_visitor.Visit(syntax, new VisitorContext());
var replacerData = new(oldToken, newToken);
return s_replacer.Visit(syntax, replacerData);

Alternative Designs

I dont see a nice alternative design since we already ship the "same" (apart from the argument) API and should try to keep it familiar for users.

Risks

None come to mind.

lameox commented 2 years ago

The conceptually same changes probably make sense for their VB counterparts as well but I am less familiar with the VB side of Roslyn so i might be mistaken.

CyrusNajmabadi commented 2 years ago

By passing the arguments to the replacers around as structs we should be able to make all these APIs completely allocation-free which should be desirable.

Note: i haven't yet seen a case where these allocation costs are a problem. In practice they're normally a single allocation total, which isn't really a problem that needs solving. Generally speaking, i prefer alloc-free apis being possible when we have cases where something is a hotspot, or is just leading to a ton of garbage through normal usage patterns. For these visitors, features generally alloc once and call the visitor, leading to very little garbage overhead.

lameox commented 2 years ago

Yeah i guess that makes sense. The alloc-free api would just be a neat side-effect that would be possible to build on top, not the main motivation. Main motivation would be for third parties to have the option to pass arguments on the stack depending on their use-cases.

CyrusNajmabadi commented 2 years ago

Main motivation would be for third parties to have the option to pass arguments on the stack depending on their use-cases.

FWIW, this feels speculative. To me, API decisions like this should be driven by actual cases where users need this and existing available solutions are not sufficient. That will allow us to do a reasonable evaluation if this extra API complexity is warranted or not. Thanks!

lameox commented 2 years ago

For these visitors, features generally alloc once and call the visitor, leading to very little garbage overhead.

Wouldn't the same hold true for the SymbolVisitor as well or am i misunderstanding something?

To me, API decisions like this should be driven by actual cases where users need this and existing available solutions are not sufficient.

Ok let me try to explain our use case. At my work we are currently investigating using Roslyn and C# to facilitate writing Unit Tests for a PLC. This would entail users writing their tests in C# which we are then looking at and making use of Roslyn transforming/compiling into our own bytecode that will be shipped off to the PLC and run there. In partice this means we need to restrict the user to a subset of C# since a lot of Constructs the language would permit are either not allowed or are not supported yet (i.e. async / await). First talks with potential customers indicates a realistic workload to be in the neighbourhood of ~10000 syntax trees each beeing visited by multiple different SyntaxVisitors/Rewriters (~25) in our toolchain. Since this will be potentially running on quite ressource constrained devices, we would like to avoid GC here since we have a clear avenue of how to do so and can already do it with our symbol based visitors.

I am aware that this is not your typical usecase but i hope it helps illustrate where i'm coming from.

Also I think the risk and additional effort in supporting is basically zero here since even though we add a lot of API surface, the actual implementation is already done in the Visitor/Replacer without arguments since they logically behave exactly the same as the proposed API and most of it is auto-generated anyways meaning we can reuse all testing and autogenning of the current code. Therefore I would think providing potential users with the same possible base classes as for the SymbolVisitor and OperationVisitor would be worth it just for the consistency alone if not for the fringe use cases like we have.

The discussion about moving the current code over could then be had on its own but since it would be a perf gain (even if small) with virtually no drawbacks: why not?

CyrusNajmabadi commented 2 years ago

Since this will be potentially running on quite ressource constrained devices, we would like to avoid GC here since we have a clear avenue of how to do so and can already do it with our symbol based visitors.

I would def like actual measurements here. Roslyn is already Extremely heavy. Our syntax trees and symbolic graphs are by no means lightweight. So i'm curious about the space of allocating these visitors/rewriters being a hotspot. it might be an issue, but we'd want confirmation first.

Note: in our own code, when we've had this concern, it's been fairly trivial. We simply pool the visitors. e.g. we take the visitor from a pool, initialize it with the data it needs, run it, clear it, add back to teh pool.

CyrusNajmabadi commented 2 years ago

The discussion about moving the current code over could then be had on its own but since it would be a perf gain (even if small) with virtually no drawbacks: why not?

There are differences in opinion on the topic of 'drawbacks'. APIs are things we have to maintain, support, include metadata for, etc. It is certainly fine to argue that one believes the benefits outweigh those costs. However, it's something to be discussed :)

--

In a case like this, where the benefit stated is better perf, it would be super helpful to actually have real data and real use cases driving that. First, to motivate hte scenario. Second, and most importantly, so we can validate that we've actually solved the issue. :)

sharwell commented 2 years ago

⚠️ The visitor result should be TResult and not TResult? (to match the 2-argument form of symbol visitor). It's a breaking change to fix the API for the 1-argument form, but that doesn't mean we have to create a second broken form. 😄

jcouv commented 2 years ago

Assigned to @333fred and @CyrusNajmabadi to triage/drive as needed, as area owners for api-syntax.

lameox commented 2 years ago

⚠️ The visitor result should be TResult and not TResult? (to match the 2-argument form of symbol visitor). It's a breaking change to fix the API for the 1-argument form, but that doesn't mean we have to create a second broken form. 😄

Updated it. I guess in this case the Signature of Accept should also change like this?

- public abstract TResult? Accept<TArgument, TResult>(CSharpSyntaxVisitor<TArgument, TResult> visitor, TArgument argument);
+ public abstract TResult Accept<TArgument, TResult>(CSharpSyntaxVisitor<TArgument, TResult> visitor, TArgument argument);
333fred commented 2 years ago

@lameox we're still looking for concrete perf numbers to drive this request. Particularly since some of the APIs referenced are in the compiler itself and haven't shown up on our benchmarks.

lameox commented 2 years ago

Well as @CyrusNajmabadi already pointed out in the compiler itself this will most likely not show up in benchmarks since most features only allocate once. For our own use case we are still too early in development (and will be for a while) to come up with any meaningful measurements that aren't done on mostly contrived data.

For me the main point also never was just performance since that was rather a neat - if small - side effect of the proposed changes. The main goal is to provide a unified concept between the APIs all the visitors since as it stands we can freely call concurrently into our symbol visitors but need to write pooling, rent/free logic etc. for the syntax visitors. During initial evaluation for our project a few team members were kind of surprised by that which ultimately led me to open this issue. I can however understand if making the dev experience unified between all visitor kinds is not enough of a reason to drive this feature since the other benefits on their own most likely are not. Also I guess I could have worded the proposal a bit better since it seems the very small perf benefits got perceived as the main motivation.

333fred commented 2 years ago

I will defer to @CyrusNajmabadi's opinion on whether we should move forward with this then.

CyrusNajmabadi commented 2 years ago

That's a much more compelling reason for me. Being able to flow inputs/outputs through singletons of these in a clean/concurrent fashion, makes a lot of sense.

lameox commented 2 years ago

I would be down to implement this

333fred commented 2 years ago

API Review

lameox commented 1 year ago

AFAICT there currently are no dedicated unit tests for the existing SyntaxWalkers and SyntaxRewriters. However they are implicitly tested since they are used in a few places to validate test results. For example here where a Visitor is used to assert the number of tokens in a tree.

Should I come up with a test plan for existing and new implementations?

333fred commented 1 year ago

Should I come up with a test plan for existing ... implementations?

If you wanted to add tests as part of a PR, they would certainly be appreciated. Testing of these public APIs is a known shortcoming, but they are tested enough implicitly that we're not super worried about addressing it.

Should I come up with a test plan for ... new implementations?

Yes, the new API will not be used implicitly like the existing APIs are, so they need more comprehensive tests.

lameox commented 1 year ago

Ok back from vacation.

Yes, the new API will not be used implicitly like the existing APIs are, so they need more comprehensive tests.

I would propose the following tests for both VB and C# variants of the classes:

For all the tests above also assert that arguments/results are passed along correctly.

Anything else I am missing?

333fred commented 1 year ago

That seems fine @lameox.