dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 19 forks source link

Roslyn analyzer throws error AD0001 NullReferenceException #3305

Open akoeplinger opened 4 months ago

akoeplinger commented 4 months ago

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=733128

Build leg reported

VMR Vertical Build / Ubuntu2404_DevVersions_x64 / Build

Pull Request

https://github.com/dotnet/sdk/pull/42019

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "",
    "BuildRetry": true,
    "ErrorPattern": "error AD0001: Analyzer.*threw an exception of type 'System.NullReferenceException'",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

Additional information about the issue reported

No response

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=733128 Error message validated: [error AD0001: Analyzer.*threw an exception of type 'System.NullReferenceException'] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 7/8/2024 7:22:28 PM UTC

Report

Build Definition Step Name Console log Pull Request
2579533 dotnet-runtime Build product Log
2576798 dotnet-runtime Build product Log
2573475 dotnet-runtime Build product Log

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 3
ViktorHofer commented 4 months ago

Hmm... This error showed up in https://github.com/dotnet/sdk/pull/36807 as well:

CSC : error AD0001: Analyzer 'Microsoft.NetCore.CSharp.Analyzers.Runtime.CSharpDetectPreviewFeatureAnalyzer' threw an exception of type 'System.NullReferenceException' with message 'Object reference not set to an instance of an object.'. [/vmr/src/efcore/src/EFCore/EFCore.csproj]

But I don't see it in the above table. Are we missing data?

akoeplinger commented 4 months ago

It's now in the table, probably just took a bit.

ViktorHofer commented 4 months ago

Interesting, I was under the impression that this only affects VMR builds but apparently this is more widespread.

akoeplinger commented 4 months ago

Yeah. I queried Kusto and we hit this 133 times over the last 60 days just in build logs.

ericstj commented 4 months ago

Should we add "BuildRetry": false to the known issue pattern here? I haven't used that feature but it may help.

akoeplinger commented 4 months ago

Yeah. Done.

jaredpar commented 4 months ago

Not sure what is going on but a significant chunk of the hits here seem to be false positives. Just spent 10 minutes digging through builds and can't see the error on 75% of them.

jaredpar commented 4 months ago

Many of the builds are attributed to dotnet-runtime when it's actually dotnet-runtime-perf. Also the results look like this ...

image

If the compiler is indeed throwing there it's very hard to dig through to the failure.

jaredpar commented 4 months ago

Cross posting the analysis from the linked bug on roslyn-analyzers

This is the stack of the NullReferenceException in at least one case:

System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Concurrent.ConcurrentDictionary`2.TryRemoveInternal(TKey key, TValue& value, Boolean matchValue, TValue oldValue)
at Microsoft.CodeQuality.Analyzers.Maintainability.AvoidUnusedPrivateFieldsAnalyzer.<>c__DisplayClass5_0.<Initialize>b__2(OperationAnalysisContext operationContext)
at Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor.ExecuteAndCatchIfThrows_NoLock[TArg](DiagnosticAnalyzer analyzer, Action`1 analyze, TArg argument, Nullable`1 info, CancellationToken cancellationToken)

That almost certainly represents this line in the roslyn analyzers code:

IFieldSymbol field = ((IFieldReferenceOperation)operationContext.Operation).Field;
if (field.DeclaredAccessibility == Accessibility.Private)
{
    referencedPrivateFields.TryAdd(field, default);
    // Error is here. 
    maybeUnreferencedPrivateFields.TryRemove(field, out _);
}

Both values here are non-null:

That seems like a runtime bug.

jaredpar commented 4 months ago

@akoeplinger, @ericstj, @ViktorHofer at least the variation I'm seeing above appears to be a runtime bug. Do we want to use this issue to track that or file a new one?

ericstj commented 4 months ago

I think a new one since the pattern observed here (single log statement with NRE) can't necessarily tie it to the single analyzer. Do you have a dump to help triage the runtime issue or is it just based on the callstack?

jaredpar commented 4 months ago

can't necessarily tie it to the single analyzer.

The log statements are always single line (for reasons I don't understand). But if you get the matching binlog you can usually see the full stack trace if you dig down into the messages.

Do you have a dump to help triage the runtime issue or is it just based on the callstack?

This is just based on call stacks. It manifests as an exception in the analyzer and by default compiler catches those and issues a warning.

The compiler can be configured to fail fast when this happens by setting the following msbuild property

<Features>$(Features);debug-analyzers</Features>

The failures are mostly coming on the runtime pipeline builds so you'd need to be setup to catch crash dumps on process FailFast. Doing that and we should get a dump in a few days.

jaredpar commented 4 months ago

Another variation of the NRE looks like this

Exception occurred with following context:
Compilation: Microsoft.CodeAnalysis.Razor.Compiler
IOperation: Invocation
SyntaxTree: /vmr/src/razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/Generated/Syntax.xml.Syntax.Generated.cs
SyntaxNode: GetAnnotations() [InvocationExpressionSyntax]@[6078..6094) (208,30)-(208,46)
System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.GetOperationSymbol(IOperation operation)
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.OperationUsesPreviewFeatures(OperationAnalysisContext context, ConcurrentDictionary`2 requiresPreviewFeaturesSymbols, INamedTypeSymbol previewFeatureAttributeSymbol, ISymbol& referencedPreviewSymbol)
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.<>c__DisplayClass33_0.<Initialize>b__1(OperationAnalysisContext context)
at Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor.ExecuteAndCatchIfThrows_NoLock[TArg](DiagnosticAnalyzer analyzer, Action`1 analyze, TArg argument, Nullable`1 info, CancellationToken cancellationToken)

That is basically down to this block of code. That code on it's own (no inlining) is very hard to see a NRE on. Suspect that there is some amount of inlining going on here.

The invocation being analyzed here is the GetAnnotations() call on this line. That means we should be at this point in the code

        private static ISymbol? GetOperationSymbol(IOperation operation)
            => operation switch
            {
                // EXECUTION SHOULD BE HERE
                IInvocationOperation iOperation => iOperation.TargetMethod,
                IObjectCreationOperation cOperation => cOperation.Constructor,

Basically that should be an InvocationOperation and given that it's sealed and only impl of IInvocationOperation it is likely a candidate for inlining. At the same time the TargetMethod is an auto-implemented property and shouldn't ever null ref itself.

This a very puzzling one to understand. Seems like another case we'd need the debug-analyzers flag to get a dump and track down.

@333fred in case he can see any flaw in my analysis here.

333fred commented 4 months ago

That is basically down to this block of code. That code on it's own (no inlining) is very hard to see a NRE on.

Agreed. Looking at that block, there shouldn't be an opportunity for a null ref there, everything is null-safe. Even inlining in that location shouldn't result in a null-ref; looking through every property called by that method, directly or indirectly, they're auto-props, and the instances they're called on are checked for null beforehand. The only thing in that closure that isn't an extremely simple auto-prop is the call to arrayTypeSymbol.ElementType, but it's also extremely hard to see where that property would ever null-ref; further, it's not a good candidate for inlining, since there are several implementations. Agreed that we need more data to troubleshoot further.

jaredpar commented 4 months ago

Note: the results from the aspnetcore-quarantined-pr pipeline aren't actionable. They don't upload any binlogs when the build fails so we can't see what is happening.

@captainsafia who on aspnetcore owns this pipeline?

captainsafia commented 4 months ago

@captainsafia who on aspnetcore owns this pipeline?

AFAIK, the build isn't configured to produce binlogs at the moment (ref). @wtgodbe can help with making a change here to produce binlogs for further investigation.

jaredpar commented 4 months ago

Started compiling all of the results from looking at the failures into this gist

jaredpar commented 4 months ago

AFAIK, the build isn't configured to produce binlogs at the moment (ref). @wtgodbe can help with making a change here to produce binlogs for further investigation.

Note: the aspnetcore-ci pipeline uploads logs but it overwrites them on retry. That means if we hit any of these failures, then retry the logs of the failure are effectively deleted. That means we can't really get any info from any of the aspnetcore pipelines here.

jaredpar commented 4 months ago

Have a gist where I've summarized the diff errors I'm seeing. Dug into five of them.

There are 2-3 other analyzer that are producing AD0001 diagnostics that I haven't bothered to dig into.

I think the next steps are to get the owners of dotnet-unified-build and sdk-unified-build to enable compiler crash on analyzer exception and collection of dump logs so we can get a better idea what is going on here.

jaredpar commented 4 months ago

Can we please resolve this as a dupe of https://github.com/dotnet/runtime/issues/104123? That is what the runtime team's investigation lead them to. The fix is in PR and will be back ported to 9.0 P7

akoeplinger commented 4 months ago

the issue just got reopened, I also think we should keep this open for a bit so we have Build Analysis tracking

omajid commented 3 months ago

I am also running into this (and some variants) when building the .NET 9 Preview 7 using the VMR on a number of arm64 platforms using 9.0.100-preview.7.24380.2 as the build SDK.

Like I mentioned in https://github.com/dotnet/source-build/issues/4555 I am seeing a number of variants of errors:

jaredpar commented 3 months ago

This as a dupe of dotnet/runtime#104123. I don't have permissions to resolve the issue.

ellahathaway commented 3 months ago

https://github.com/dotnet/source-build/issues/4576 - I suspect that SB just encountered this error in one of our 9.0 builds