dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.06k stars 4.04k forks source link

ITryOperation.Finally.Syntax returns BlockSyntax instead of FinallyClauseSyntax #27208

Closed bkoelman closed 6 years ago

bkoelman commented 6 years ago

Version Used: Microsoft.CodeAnalysis v2.8.0

Steps to Reproduce:

  1. Obtain ITryOperation for the next try statement:
    
    try
    {
    }
    catch (System.Exception ex)
    {
    }
    finally
    {
    }

2. Inspect its `.Finally.Syntax` property
3. This returns a `BlockSyntax`, whose `.Parent` returns `FinallyClauseSyntax`

**Expected Behavior**:
`ITryOperation.Finally.Syntax` points to a `FinallyClauseSyntax`, similar to how `ITryOperation.Syntax` points to a `TryStatementSyntax` instead of its block.

**Actual Behavior**:
`ITryOperation.Finally.Syntax` points to a `BlockSyntax`.
bkoelman commented 6 years ago

/cc @mavasani @AlekseyTs @333fred Is this a bug or by design?

AlekseyTs commented 6 years ago

I think this is intentional. That IBlockOperation corresponds to the BlockSyntax. Documentation says "Finally handler of the try." and it is the handler. IOperation tree is not an annotated syntax tree, it is a different tree and some syntax nodes do not have corresponding IOperation nodes. This is by design.

bkoelman commented 6 years ago

I'm okay with the current .Finally.Syntax, just wanted to check because there have been many bugs in this area. Thanks for investigating.

IOperation tree is not an annotated syntax tree, it is a different tree and some syntax nodes do not have corresponding IOperation nodes. This is by design.

Opinions may have changed over time, but at the introduction of IOperation it was presented as a language-agnostic, richer alternative for syntax. Similar to how symbols provide semantic information for declarations, operations would provide semantic information for executable code blocks. And not "for about 80% of the possible constructs in code blocks."

There has always been discussion (here and here) about how close operations should match with syntax. And lots of bugs were fixed over the past years.

I can understand that, due to time constraints, V1 only supports the most essential constructs, but I always assumed that the gaps would be filled in at a later time. Not ever getting the missing operations feels like a broken experience and a lost investment in using IOperation in many analyzers that I have written. After all, we do not have a walker that traverses operations, and then also some syntax nodes for their lack of operation support.

Tagging @CyrusNajmabadi @heejaechang @mavasani @genlu @jinujoseph @srivatsn for thoughts.

333fred commented 6 years ago

I think the point is that we have IOperation nodes for things that have semantic meaning. We do intend to fill in the gaps of missing nodes for semantics that we haven't covered yet, including the ones in the PR you linked. The current goal is to enable dataflow analysis of code blocks to enable analyzers missed in the FxCop port, such as CA2000 or CA2100. After the control-flow graph and dataflow analysis frameworks have shipped we'll examine what's next for the API, and I certainly intend to push for full support of all missing nodes as part of that.

mavasani commented 6 years ago

Completely agree with @333fred. We do intend to fill the semantic gaps soon. We have banked another big analysis feature on this API (flow analysis) and intend it to be complete and fully supported API. However, we never intended IOperation tree to be 1:1 match with syntax tree, it is a semantic tree and so you will have syntactic constructs which have no matching IOperation node.

bkoelman commented 6 years ago

I like how operations abstract away various syntactic details, such as redundant braces or various notations to initialize an array. And I'm not expecting all syntax nodes to have an operation counterpart. The subtle differences between C# and VB alone would make that unfeasible. But not having it cover all semantic language constructs makes using it a major pain. Allow me to explain with some examples.

Say I want to analyze assignments. So I create an operation walker and handle VariableDeclarator, SimpleAssignment, CompoundAssignment, IncrementOrDecrement and DeconstructionAssignment. All works fine, until I realize it does not hit assignments via a let expression inside a query. IOperation does not support queries. So aside from an operation walker, I need to register callbacks for various syntax nodes for my analysis to be complete. And then the problem becomes relating the results from both sources to each other.

Another example. I want to analyze the target block when a checked/unchecked scope is entered. On the syntax level, there's a statement for that, but it does not appear in the operation tree. Knowing the current checked/unchecked state from an operation exists, but the entering of such a block is semantic information too, right? As a workaround I need to register for IBlockSyntax and inspect the parent syntax to see if we are entering such a block. But it's quite expensive to run that on every block! Alternatively, I can switch to syntax walker and track the effective context myself (while it's already implemented in operations). Having an operation that indicates such a block is entered, along with the calculated effective context in the target block would help. And that same operation can be reused for checked expressions. Which reminds me of the difficulty to determine whether an operation represents an expression or a statement, something else that's painfully missing at the moment.

One more. I want to analyze the names of types, members, parameters, local functions, tuple elements etc. So I register for various symbol kinds and the operations VariableDeclarator, Tuple and AnonymousObjectCreation. This all works, but again not for what happens in queries. So after a lot of trial-and-error, I discover I also need to register syntax analysis for FromClause, JoinClause, JoinIntoClause, QueryContinuation and LetClause. But now I need to handle and special-case redundant braces, casts and all myself.

Looking back in time, let's forget about the sudden disappearance of IOperation.HasErrors, Operation.Syntax sometimes returning null and the lack of exposing IOperation.WasCompilerGenerated for a long time, but suffice it to say its been a bumpy ride. And I'm still hoping things will get better now that IOperation is no longer experimental.

I get that the team is currently focussed on getting CFG ready and am happy to hear additional operations will be addressed after that. I'd like you to also consider my scenarios above. When time permits, I'll be happy to contribute as I did in the past, but that must fit in the teams' design plans obviously.

heejaechang commented 6 years ago

I am +1 on making it easy to mix and match between different analysis hooks (operation, syntax, code blocks)

by the way, @bkoelman you could help us by sending us PR? :)

mavasani commented 6 years ago

Tagging @dotnet/analyzer-ioperation

@bkoelman Thank you for the detailed comment on the missing pieces. Let me try to summarize your requests:

  1. Richer IOperation API for query expression and clauses.
  2. IOperation API for checked/unchecked scope
  3. IOperation API to distinguish whether an operation node is an expression or a statement
  4. HasErrors or IsInvalid property to know about malformed code.

Is there anything I missed out from your list? Otherwise, I will file issues for these or ensure they exist and we will prioritize these requests.

bkoelman commented 6 years ago

@heejaechang I can probably contribute some in a few weeks, though I prefer to wait until CFG has been merged. IsExpression/IsStatement would be a good start for me to pick up.

@mavasani thanks for hearing me! Those 4 are my biggest pain points. But... there is more... :)

Essentials:

Nice-to-haves:

AlekseyTs commented 6 years ago

IReturnOperation would be hit for a yield break statement. While it is straightforward to determine from syntax, a boolean property IsYield would be nice.

IReturnOperation has three distinct OperationKinds: Return, YieldBreak, YieldReturn.

333fred commented 6 years ago

@bkoelman CFG work merged into the dev15.8-preview3 branch yesterday, and should make it into master sometime today. I would also definitely encourage opening issue(s) for anything we're not yet tracking. A lot of our initial design for IOperation and CFG was driven by the idea that we'd come back and add convenience APIs for things in a community-driven manner, as we had a limited time budget for V1. So your feedback is very important for shaping the future direction of the API.

bkoelman commented 6 years ago

@AlekseyTs Thanks, looks like I missed those return kinds. Consider that one resolved.

Something else that I keep running into: compiler-generated operations have caused me nothing but lots of hard-to-trace bugs, because they typically have no location to report on. So VS reports them, and then you need to try to figure out which code fragment in the project is causing it. Or sometimes they report an invalid location. Example: a void lambda that has an implicit return, while a method does not. These come as surprises when just trying to get the basics right. My code is already sprinkled with ifs to skip over them, and yet they keep popping up. Therefore I have come to think implicit operations should never surface by default. Not in analyzer callbacks, not in operation.Children/Parent and not in operation visitors or walkers. Opt-in should be possible similar to context.ConfigureGeneratedCode (see https://github.com/dotnet/roslyn/issues/10214). There is one analyzer where I am interested in implicit conversions, where I then need to opt-in. Forgetting that is quickly discovered when building such an analyzer. It is a breaking change, though.

CyrusNajmabadi commented 6 years ago

Something else that I keep running into: compiler-generated operations have caused me nothing but lots of hard-to-trace bugs

I think, no matter what, once you have an AST you're going to have impedence mismatch issues between the AST and your domain. While it would be attractive to want an ast perfectly suited to any domain, it's ultimately always going to be a problem for some customer.

have caused me nothing but lots of hard-to-trace bugs, because they typically have no location to report on.

Isn't there IsImplicit for this sort of thing?

333fred commented 6 years ago

Additionally, all Operations should have a corresponding syntax, even if they're implicit. Are you saying that you're encountering nodes with no syntax?

bkoelman commented 6 years ago

have no location to report on.

Sorry, I should have clarified that. The compiler-generated operations (.IsImplicit == true) actually do have syntax and a location. But the analyzer I am referring to reports on the return keyword instead of the whole returned expression. Because when a multi-line lambda expression is returned, the squiggle becomes very long and disturbing (and visually hides any nested diagnostics). So the analyzer looks at the syntax type (ReturnStatementSyntax or YieldStatementSyntax) to find the location for the return or yield return keyword span. Being unable to, it results in Location.None. Now I am convinced that the analyzer should not report in that case. But at that time, I thought such a situation could never happen.

@CyrusNajmabadi True that a tree will never fit all needs. But I think it helps to have defaults that do "the expected thing" for unexperienced API consumers that have no knowledge of roslyn internals. Like me, back then, when I was unaware that compiler-generated nodes even existed. Not getting them keeps life simple for the simple things. Advanced scenarios like analyzing allocations requires such deep knowledge of the language that I think it would be okay for additional work to be needed to get that right.

333fred commented 6 years ago

@bkoelman I'm not sure that makes sense in many cases. There are a good number of scenarios where implicit nodes are very, very necessary to understand anything about the semantics. Take implicit this or base calls, for example. It wouldn't make any sense to have an IInvocationOperation with a null InstanceReceiver because the receiver was implicit. I think that consumers of this API may well be surprised when nodes crop up, but those surprises are how the language works and if they want their analyzers to be accurate, they need to handle those scenarios.

There are some cases where there are inconsistencies, like the lambda case you pointed out. But I don't think that's an argument for hiding implicit nodes; rather, I think it's an argument for fixing inconsistencies.

CyrusNajmabadi commented 6 years ago

But I think it helps to have defaults that do "the expected thing" for unexperienced API consumers that have no knowledge of roslyn internals.

We err'ed in a different direction. These are supposed to represent the "operations" that actually become the executable code at the end of the day. So, it's important to represent all these things. Even if they're things that the user doesn't write but which teh compiler inserts.

This was also important because a core customer was one that would be looking for issues regardless if those were backed by actual syntax the user wrote, or if they were things the compiler generated itself.

bkoelman commented 6 years ago

It wouldn't make any sense to have an IInvocationOperation with a null InstanceReceiver because the receiver was implicit.

I agree. Properties of operations should not be null when they refer to an implicit operation. My proposal is to (based on configuration during analyzer init):

One way to overcome the first two without roslyn changes:

public static class OperationExtensions
{
    public static IEnumerable<IOperation> GetExplicitChildren(IEnumerable<IOperation> children)
    {
        foreach (IOperation child in children)
        {
            if (!child.IsImplicit)
            {
                yield return child;
            }
        }
    }
}

public class ExplicitOperationVisitor : OperationVisitor
{
    public override void Visit(IOperation operation)
    {
        if (operation != null && !operation.IsImplicit)
        {
            operation.Accept(this);
        }
    }
}

public abstract class ExplicitOperationVisitor<TArgument, TResult> : OperationVisitor<TArgument, TResult>
{
    public override TResult Visit(IOperation operation, TArgument argument)
    {
        if (operation != null && !operation.IsImplicit)
        {
            return operation.Accept(this, argument);
        }

        return default(TResult);
    }
}

public abstract class ExplicitOperationWalker : OperationWalker
{
    public override void Visit(IOperation operation)
    {
        if (operation != null && !operation.IsImplicit)
        {
            base.Visit(operation);
        }
    }
}

This works, however requires some internal knowledge to come up with and is not configurable.

To compare, I investigated how IMethodSymbol behaves for a compiler-generated ctor:

  1. context.RegisterSymbolAction for SymbolKind.Method does not get called
  2. INamedType.GetMembers() includes the compiler-generated ctor
  3. SymbolVisitor visits the compiler-generated ctor
  4. There is no walker for symbols

For consistency with symbols, at the very least I think there should be a way (if not by default) to prevent analyzer callbacks for implicit operations. This is basically what question 1 in #10214 is about.

bkoelman commented 6 years ago

When an analyzer throws, the exception is not propagated into a unit test.

I found a workaround in StyleCop which I am now using: special-case when AD0001 is returned. So this one is not relevant anymore.

mavasani commented 6 years ago

When an analyzer throws, the exception is not propagated into a unit test.

@bkoelman Can you clarify this? We have a helper method GetAnalyzerDiagnostics that has an optional parameter Action<Exception, DiagnosticAnalyzer, Diagnostic> onAnalyzerException = null, which does propagate the exception info if the caller desires. If you are creating your own CompilationWithAnalyzers instance, you can pass in such a delegate to the constructor of CompilationWithAnalyzerOptions

mavasani commented 6 years ago

It would be nice if IBranchOperation, when it represents a break or continue statement, exposed a reference to the parent loop that it is breaking from or continuing at.

IBranchOperation.Target points the the destination label symbol. ILoopOperation now exposes the break and continue label symbols and ISwitchOperation exposes an exit lable, so you should be able to match the branches with parent loop/switch if desired. I filed https://github.com/dotnet/roslyn/issues/28095 to track this request.

mavasani commented 6 years ago

An API to get operation block(s) from an IMethodSymbol

Filed https://github.com/dotnet/roslyn/issues/28097

mavasani commented 6 years ago

HasErrors or IsInvalid property to know about malformed code.

We have decided against exposing any such information directly on operations nodes and also any extension method for it:

  1. HasErrors or IsInvalid can have vastly different semantics based on client requirements. Is the client looking for syntax errors or semantic errors? Do they want to know of a diagnostic of a specific severity or above or any diagnostic? Do they want to know if the diagnostic is on the span related to this operation only but not on any descendant (for example, if there is a diagnostic on an expression, should this invalid flag propagate to ancestors or not)?
  2. Performance: The approaches we have tried in the past caused performance regressions if we attempt to compute this information using GetDiagnostics API. We can work on optimizing these, if we are clear on the required semantics, but again this is non-trivial work.

Our recommended approach for clients is to instead check for validity of data such as type exposed directly on the operation node and child nodes (for example, is one of the expected non-null child operation null). If their intent is to strictly operate on operations which have no compiler diagnostics, they should call the relevant GetDiagnostics API, keeping in mind the associated perf implications.

mavasani commented 6 years ago

@bkoelman Thanks a lot for your feedback, and your willingness to help us out by contributing in this area. I believe all of your requests or concerns in this issue have now either been addressed or have a separate specific issue tracking them. If you feel something has been missed or disagree with some statements/claims made here, please file a separate issue to discuss and track that work. I am closing this issue as the original issue filed here is by design.

bkoelman commented 6 years ago

We have a helper method GetAnalyzerDiagnostics...

My project has a NuGet reference to Microsoft.CodeAnalysis v2.8.2, which does not contain type Microsoft.CodeAnalysis.DiagnosticExtensions. According to source browser, it lives in a project called TestUtilities. Maybe that does not get distributed via NuGet?

For reference, I am using this code, which I copied years ago from another repo.

mavasani commented 6 years ago

@bkoelman Seems like your code here can be changed to use this overload of WithAnalyzers API. The CompilationWithAnalyzersOptions wrapping the existing AnalyzerOptions takes an optional delegate to callback on analyzer exception.

bkoelman commented 6 years ago

Thanks for pointing this out. It works when I marshal the exception from the callback to the thread that the unittest is running on (which is different from the thread that the analyzer and callback runs on).