dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.88k stars 4.01k forks source link

[Proposal] Provide a way for RegisterSymbolAction to include symbols in member bodies #14061

Open dpoeschl opened 7 years ago

dpoeschl commented 7 years ago

Today, if you RegisterSymbolAction for SymbolKind.Method, you get a subset of method symbols that only includes member-level declarations. Local Functions, lambdas, and anonymous methods are excluded from the results.

There are a couple proposals:

  1. Just update RegisterSymbolAction to return all symbols in the tree that match the requested SymbolKind, even though this would mean the API would be returning new, potentially unexpected symbols.
  2. Add either an overload of RegisterSymbolAction that lets you opt in to including symbols from member bodies or a completely new method to register for all symbol actions (to avoid changing what the API currently returns)

I personally perfer option 1, as does @CyrusNajmabadi link

dpoeschl commented 7 years ago

As part of fixing https://github.com/dotnet/roslyn/issues/8753 with PR https://github.com/dotnet/roslyn/pull/13931, I'm choosing to only return parameters that are on symbols that would be returned by RegisterSymbolAction today for a consistent (although suboptimal) experience. I could have added support for the other types of methods, but I'd rather there be a consistent experience under this new overload.

DavidArno commented 7 years ago

Please do not create an overload that takes a bool parameter. Instead, please create a second method who's name explains the difference, eg RegisterSymbolIncludedNestedAction or somesuch.

CyrusNajmabadi commented 7 years ago

Since returning these new kinds of methods (especially lambdas/anonymous methods) would be a functionally breaking change

I'm not a fan of considering that sort of thing a 'breaking change'. With this sort of appraoch, our APIs can never change the data they return, lest that break a consumer that did not expect it. We've explicitly stated that such a system is a non-goal. For example, someone moving to a later version of Roslyn may get different syntax trees (including brand new nodes) on the same source text.

Similarly, APIs like 'Find-References' may include more results in the future if we discover some sort of reference we missed (or we cascade to more results).

Other ways to think about it: Imagine someone could say "call me back on Named-Type symbols". And in V1 we did that... but we forgot about Enums. Woudl we then say "We're not going to include 'Enums' in the list, and we need a new API 'RegisterSymbolActionIncludingEnums()'"?

Or, imagine we came up with a new top level IMethodSymbol in the future (for example, if we had a special method for "deconstruction" (which was actually one of the plans we were investigating for hte language)). Would we not return that IMethodSymbol for people asking for SymbolKind.Method?

In both cases, i believe we would have these new symbols come through the existing API. And we would put the onus on clients to be resilient to getting different things over time.

As such, i think the right thing to do is to not have an additional overload, or any way to customize behavior here. If the user asks for things like SymbolKind.Parameter or SymbolKind.Method, then we should give them the symbols in the tree that have that type.

CyrusNajmabadi commented 7 years ago

Nice. You can thumbs up your own posts!

dpoeschl commented 7 years ago

I agree with you @CyrusNajmabadi, we just have to get @mavasani & company on board with such a change. This proposal was a compromise based on his concern about the API change, but I agree with your argument and think we should discuss this further.

dpoeschl commented 7 years ago

@CyrusNajmabadi @DavidArno I've updated the title and the original post to better reflect the state of the proposal.

mavasani commented 7 years ago

This needs a design discussion at the compiler/analyzer API design meeting and @JohnHamby, @mattwar @srivatsn and @gafter should be the minimum set of people to sign off.

As far as I recall, symbol actions were initially intended to be designed for top level declarations, as you can reach the nested/contained symbols from them (hence the missing support for locals/parameters till date). This design also allowed us to completely rely on compiler's SymbolDeclaredEvents to trigger analysis - compiler guarantees only one such symbol declared event per top level declaration and this is also guaranteed to be performant.

With the new proposed design of providing callbacks for nested/contained symbols, we would need to figure out a reliable and performant way to identify such symbols and make callbacks. Basically: Will such symbols have compilation events in the compilation event queue?

mavasani commented 7 years ago

I'm not a fan of considering that sort of thing a 'breaking change'.

Except that it is a breaking change. Our analyzer API semantics mention:

A symbol action is invoked once per complete semantic processing of a declaration of a type or type member, provided that symbol has a kind matching one of the kinds supplied when the action was registered.

Any method symbol analyzer that was written with the assumption that the callback happens only for a top level method will be affected.

CyrusNajmabadi commented 7 years ago

Yup. I def think we should meet up about this. Note that we should likely do this earlier rather than later so that we don't continually change what this API reports.

CyrusNajmabadi commented 7 years ago

Any method symbol analyzer that was written with the assumption that the callback happens only for a top level method will be affected.

I feel the tension in making changes here. But i'm not that sympathetic to it as we've always had the position that people need to lenient to our APIs returning more data (or changing data) over time. As roslyn adds more types of concepts, we'll need to fit them into existing APIs, and that means code needs to be resilient to seeing thing it didn't see before.

I think the docs here are the problem, and not the proposed solution.

only for a top level method will be affected.

This is inherently a fluid definition. What happens if 'locals' become allowed at the type-member level? What if we move to having 'parameters' be members of types? i.e. "class Person(string name, int age) { }"? Now, parameters could be type-members, and would be returned under that strict interpretation.

That's why we need code to be resilient. They may see all sorts of symbols and types they never expected before, and those entities may show up in locations that were never possible before. Over-constraining in our docs is very limiting, and not in line with the decisions we've made elsewhere.

That's just my 2c though. We should meet to discuss it and decide what to do (ideally ASAP). If we really feel like exposing this data is unacceptable through the current API, then we at least need a system whereby people can get to it through some supported and well tested means. I'm not a fan of "NewRegisterSymbolActionThatGetsTheseOtherSymbols", but i'll accept it a solutoin if that's where we land.

srivatsn commented 7 years ago

Let's discuss this at the next design meeting.

mavasani commented 7 years ago

We discussed this at the design meeting today and following was the high level conclusion:

  1. By default, symbol callbacks will only be made for top level declarations (i.e. lambdas, lambda parameters, locals, etc. defined inside method bodies will not receive callbacks). This will ensure data backcompat for existing analyzers.
  2. We will add a new API to AnalysisContext, say ConfigureSymbolAnalysisInCodeBlocks(bool) (exact name yet to be finalized). This API would be similar to the existing ConfigureGeneratedCodeAnalysis API to configured generated code analysis . It will turn on/off symbol callbacks for all registered symbol kinds declared in method bodies, i.e. lambdas, locals, local functions, etc. We may want to consider an enum parameter instead of bool if we wish to make the configuration more granular/extensible for future needs. @JohnHamby will make a final call on this API.
msJohnHamby commented 7 years ago

Lambdas don't have symbols (at least at the language level). What sort of symbol would be provided to a symbol action?

Symbol actions for local functions should run in the same circumstances in which symbol actions run for local variables.

CyrusNajmabadi commented 7 years ago

Lambdas don't have symbols

Lambdas do have symbols. Calling GetSymbolInfo on them gives back an IMethodSymbol with MethodKind=AnonymousFunction

msJohnHamby commented 7 years ago

A lambda does not have a language-level symbol. Do we generally invoke symbol actions for compiler-synthesized symbols?

mavasani commented 7 years ago

Do we generally invoke symbol actions for compiler-synthesized symbols?

We don't invoke symbol actions for implicitly declared symbols, but the lambda function does have a user written syntax block - why is it considered synthesized?

msJohnHamby commented 7 years ago

Sorry to be bringing this up now instead of at the meeting that I missed...

... of course the lambda itself is not synthesized. Maybe I'm splitting hairs in claiming that a symbol for a lambda is synthesized vs. accepting that it exists and just doesn't have a name, but it's sort of like assigning a symbol to a block statement. There is--and should be--an IOperation for a lambda, and it seems odd to me that a single construct would have both an operation representation and a symbol representation.

bkoelman commented 7 years ago

In an attempt to optimize performance for an analyzer, I ran into this thread. Perhaps my use case helps in the discussion.

The analyzer in question reports on writes to parameters. The original version registered for ParameterSyntax, looked up its IParameterSymbol and ran data flow analysis using SemanticModel on its method body.

My optimization attempt was to instead register for IMethodSymbol, run the same data flow analysis on its method body and then loop through the parameters.

This way, data flow analysis occurs only once per method instead of for each parameter individually. But this change broke one of my tests: writes to lambda parameters are no longer reported.

To overcome this, I added an OperationWalker that overrides VisitLambdaExpression to perform the same analysis as for method symbols.

This actually made performance worse.

So I was thinking to replace the operation walker with syntax-based registration. Trying to figure out which syntax nodes I would need to register on, got me to https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Syntax/LambdaUtilities.cs. This internal type contains quite some details that I would prefer not to copy/paste.

Having RegisterSymbolAction call into lambda declarations (IMethodSymbol.MethodKind == MethodKind.LambdaMethod) would solve all issues above.

sbomer commented 2 months ago

It appears that this includes as a special case top-level methods (I think they are treated as local functions of the generated Main). This is preventing the ILLink analyzer from reporting certain warnings on top-level methods: https://github.com/dotnet/runtime/issues/101215.