Open lameox opened 2 years ago
The conceptually same changes probably make sense for their VB counterparts as well but I am less familiar with the VB side of Roslyn so i might be mistaken.
By passing the arguments to the replacers around as structs we should be able to make all these APIs completely allocation-free which should be desirable.
Note: i haven't yet seen a case where these allocation costs are a problem. In practice they're normally a single allocation total, which isn't really a problem that needs solving. Generally speaking, i prefer alloc-free apis being possible when we have cases where something is a hotspot, or is just leading to a ton of garbage through normal usage patterns. For these visitors, features generally alloc once and call the visitor, leading to very little garbage overhead.
Yeah i guess that makes sense. The alloc-free api would just be a neat side-effect that would be possible to build on top, not the main motivation. Main motivation would be for third parties to have the option to pass arguments on the stack depending on their use-cases.
Main motivation would be for third parties to have the option to pass arguments on the stack depending on their use-cases.
FWIW, this feels speculative. To me, API decisions like this should be driven by actual cases where users need this and existing available solutions are not sufficient. That will allow us to do a reasonable evaluation if this extra API complexity is warranted or not. Thanks!
For these visitors, features generally alloc once and call the visitor, leading to very little garbage overhead.
Wouldn't the same hold true for the SymbolVisitor as well or am i misunderstanding something?
To me, API decisions like this should be driven by actual cases where users need this and existing available solutions are not sufficient.
Ok let me try to explain our use case. At my work we are currently investigating using Roslyn and C# to facilitate writing Unit Tests for a PLC. This would entail users writing their tests in C# which we are then looking at and making use of Roslyn transforming/compiling into our own bytecode that will be shipped off to the PLC and run there.
In partice this means we need to restrict the user to a subset of C# since a lot of Constructs the language would permit are either not allowed or are not supported yet (i.e. async / await
). First talks with potential customers indicates a realistic workload to be in the neighbourhood of ~10000 syntax trees each beeing visited by multiple different SyntaxVisitors/Rewriters (~25) in our toolchain. Since this will be potentially running on quite ressource constrained devices, we would like to avoid GC here since we have a clear avenue of how to do so and can already do it with our symbol based visitors.
I am aware that this is not your typical usecase but i hope it helps illustrate where i'm coming from.
Also I think the risk and additional effort in supporting is basically zero here since even though we add a lot of API surface, the actual implementation is already done in the Visitor/Replacer without arguments since they logically behave exactly the same as the proposed API and most of it is auto-generated anyways meaning we can reuse all testing and autogenning of the current code. Therefore I would think providing potential users with the same possible base classes as for the SymbolVisitor and OperationVisitor would be worth it just for the consistency alone if not for the fringe use cases like we have.
The discussion about moving the current code over could then be had on its own but since it would be a perf gain (even if small) with virtually no drawbacks: why not?
Since this will be potentially running on quite ressource constrained devices, we would like to avoid GC here since we have a clear avenue of how to do so and can already do it with our symbol based visitors.
I would def like actual measurements here. Roslyn is already Extremely heavy. Our syntax trees and symbolic graphs are by no means lightweight. So i'm curious about the space of allocating these visitors/rewriters being a hotspot. it might be an issue, but we'd want confirmation first.
Note: in our own code, when we've had this concern, it's been fairly trivial. We simply pool the visitors. e.g. we take the visitor from a pool, initialize it with the data it needs, run it, clear it, add back to teh pool.
The discussion about moving the current code over could then be had on its own but since it would be a perf gain (even if small) with virtually no drawbacks: why not?
There are differences in opinion on the topic of 'drawbacks'. APIs are things we have to maintain, support, include metadata for, etc. It is certainly fine to argue that one believes the benefits outweigh those costs. However, it's something to be discussed :)
--
In a case like this, where the benefit stated is better perf, it would be super helpful to actually have real data and real use cases driving that. First, to motivate hte scenario. Second, and most importantly, so we can validate that we've actually solved the issue. :)
⚠️ The visitor result should be TResult
and not TResult?
(to match the 2-argument form of symbol visitor). It's a breaking change to fix the API for the 1-argument form, but that doesn't mean we have to create a second broken form. 😄
Assigned to @333fred and @CyrusNajmabadi to triage/drive as needed, as area owners for api-syntax.
⚠️ The visitor result should be
TResult
and notTResult?
(to match the 2-argument form of symbol visitor). It's a breaking change to fix the API for the 1-argument form, but that doesn't mean we have to create a second broken form. 😄
Updated it. I guess in this case the Signature of Accept
should also change like this?
- public abstract TResult? Accept<TArgument, TResult>(CSharpSyntaxVisitor<TArgument, TResult> visitor, TArgument argument);
+ public abstract TResult Accept<TArgument, TResult>(CSharpSyntaxVisitor<TArgument, TResult> visitor, TArgument argument);
@lameox we're still looking for concrete perf numbers to drive this request. Particularly since some of the APIs referenced are in the compiler itself and haven't shown up on our benchmarks.
Well as @CyrusNajmabadi already pointed out in the compiler itself this will most likely not show up in benchmarks since most features only allocate once. For our own use case we are still too early in development (and will be for a while) to come up with any meaningful measurements that aren't done on mostly contrived data.
For me the main point also never was just performance since that was rather a neat - if small - side effect of the proposed changes. The main goal is to provide a unified concept between the APIs all the visitors since as it stands we can freely call concurrently into our symbol visitors but need to write pooling, rent/free logic etc. for the syntax visitors. During initial evaluation for our project a few team members were kind of surprised by that which ultimately led me to open this issue. I can however understand if making the dev experience unified between all visitor kinds is not enough of a reason to drive this feature since the other benefits on their own most likely are not. Also I guess I could have worded the proposal a bit better since it seems the very small perf benefits got perceived as the main motivation.
I will defer to @CyrusNajmabadi's opinion on whether we should move forward with this then.
That's a much more compelling reason for me. Being able to flow inputs/outputs through singletons of these in a clean/concurrent fashion, makes a lot of sense.
I would be down to implement this
<TArgument>
, to match OperationWalker<TArgument>
AFAICT there currently are no dedicated unit tests for the existing SyntaxWalkers and SyntaxRewriters. However they are implicitly tested since they are used in a few places to validate test results. For example here where a Visitor is used to assert the number of tokens in a tree.
Should I come up with a test plan for existing and new implementations?
Should I come up with a test plan for existing ... implementations?
If you wanted to add tests as part of a PR, they would certainly be appreciated. Testing of these public APIs is a known shortcoming, but they are tested enough implicitly that we're not super worried about addressing it.
Should I come up with a test plan for ... new implementations?
Yes, the new API will not be used implicitly like the existing APIs are, so they need more comprehensive tests.
Ok back from vacation.
Yes, the new API will not be used implicitly like the existing APIs are, so they need more comprehensive tests.
I would propose the following tests for both VB and C# variants of the classes:
Visitors
Visit
on null
does not throw.Visit
on a Node calls the appropriate VisitXYZ
function. These would need to be autogenerated.Visit
on a Node calls into the DefaultVisit
function, if not overridden above. These would need to be autogenerated.Walkers
Visit
walks through the nodes in a tree in the correct order.Visit
respects SyntaxWalkerDepth.Node
.Visit
respects SyntaxWalkerDepth.Token
.Visit
respects SyntaxWalkerDepth.Trivia
.Visit
respects SyntaxWalkerDepth.StructuredTrivia
.Visit
respects (SyntaxWalkerDepth) int.MaxValue
or some other arbitrary positive value greater than SyntaxWalkerDepth.StructuredTrivia
. Only the root node is visited.Visit
respects (SyntaxWalkerDepth) int.MinValue
or some other arbitrary negative value. All nodes are visited.For all the tests above also assert that arguments/results are passed along correctly.
Anything else I am missing?
That seems fine @lameox.
Background and Motivation
Currently we support a
SymbolVisitor<TArgument, TResult>
(introduced in #56530) as well as aOperationVisitor<TArgument, TResult>
. It would be ideal to introduce aCSharpSyntaxVisitor<TArgument, TResult>
for the same reasons outlined in #55183: It would allow us to reuse the same visitor and allow parallel invocations on it.Additionally there should be merit in exposing the Argument on the SyntaxRewriter as well. This would, at the very least, allow us to rewrite the visitors used in the compiler itself which back the following operations:
SyntaxRemover
backingSyntaxNodeRemover.RemoveNodes<TRoot>(...)
SyntaxNormalizer
backingSyntaxNormalizer.Normalize<TRoot>(...)
Replacer<TNode>
,NodeListEditor
,TokenListEditor
andTriviaListEditor
backing theSyntaxReplacer.Replace(...)
API which in turn backs theCSharpSyntaxNode.ReplaceCore<TNode>(...)
method and all related APIs.By passing the arguments to the replacers around as structs we should be able to make all these APIs completely allocation-free which should be desirable.
Proposed API
Usage Examples
Alternative Designs
I dont see a nice alternative design since we already ship the "same" (apart from the argument) API and should try to keep it familiar for users.
Risks
None come to mind.