dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 18 forks source link

Roslyn analyzer throws error AD0001 NullReferenceException #3305

Open akoeplinger opened 2 months ago

akoeplinger commented 2 months ago

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=733128

Build leg reported

VMR Vertical Build / Ubuntu2404_DevVersions_x64 / Build

Pull Request

https://github.com/dotnet/sdk/pull/42019

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "",
    "BuildRetry": true,
    "ErrorPattern": "error AD0001: Analyzer.*threw an exception of type 'System.NullReferenceException'",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

Additional information about the issue reported

No response

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=733128 Error message validated: [error AD0001: Analyzer.*threw an exception of type 'System.NullReferenceException'] Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 7/8/2024 7:22:28 PM UTC

Report

Build Definition Step Name Console log Pull Request
795167 dotnet/aspnetcore Build Log dotnet/aspnetcore#57611
793457 dotnet/roslyn Build with analyzers Log
2525924 dotnet-dotnet Build Log
2524900 dotnet-dotnet Build Log
788973 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57436
788814 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57531
785516 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57436
2521021 dotnet-aspnetcore Run build.sh Log
783552 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57444
783142 dotnet/aspnetcore Run build.sh Log
783043 dotnet/aspnetcore Build Log dotnet/aspnetcore#57431
782203 dotnet/aspnetcore Build Log
782201 dotnet/aspnetcore Run build.sh Log
782126 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57431
781839 dotnet/aspnetcore Build Log
781470 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57366
781269 dotnet/aspnetcore Build Log
780649 dotnet/aspnetcore Run build.sh Log
779761 dotnet/aspnetcore Build Log dotnet/aspnetcore#57354
779359 dotnet/aspnetcore Build Log
779219 dotnet/aspnetcore Build Log
2517268 dotnet-aspnetcore Run build.sh Log
777995 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57362
777932 dotnet/aspnetcore Build Log dotnet/aspnetcore#57363
777634 dotnet/aspnetcore Build Log dotnet/aspnetcore#57355
777378 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57354
776976 dotnet/aspnetcore Build Log dotnet/aspnetcore#57264
2515906 dotnet-aspnetcore Run build.sh (mariner20CrossArmAlpine) Log
775099 dotnet/runtime Build product Log dotnet/runtime#106375
774802 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57323
2514726 dotnet-roslyn Build with analyzers Log
774444 dotnet/runtime Build product Log dotnet/runtime#105895
774407 dotnet/runtime Build product Log dotnet/runtime#106320
774099 dotnet/sdk Build Log dotnet/sdk#42641
773754 dotnet/runtime Build product Log dotnet/runtime#106238
773506 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57296
773446 dotnet/runtime Build product Log dotnet/runtime#106099
773397 dotnet/runtime Build product Log dotnet/runtime#106295
772736 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57284
772547 dotnet/sdk Build Log dotnet/sdk#42668
772207 dotnet/aspnetcore Build Log dotnet/aspnetcore#57263
772113 dotnet/aspnetcore Run build.sh Log dotnet/aspnetcore#57269
772023 dotnet/runtime Build coreclr/libs components needed by test build Log
771965 dotnet/runtime Build Tests Log dotnet/runtime#106229
771814 dotnet/runtime Build managed test components Log
771774 dotnet/aspnetcore Run build.sh Log
771741 dotnet/aspnetcore Build Log dotnet/aspnetcore#49286
771736 dotnet/aspnetcore Build Log
771634 dotnet/runtime Build Tests Log dotnet/runtime#105771
771612 dotnet/runtime Build product Log dotnet/runtime#106169
2512261 dotnet-dotnet Build Log
770704 dotnet/runtime Build product Log dotnet/runtime#106119
770548 dotnet/runtime Build product Log dotnet/runtime#106174
769899 dotnet/runtime Build product Log dotnet/runtime#106145
2511498 dotnet-aspnetcore Run build.sh (mariner20CrossArmAlpine) Log
769853 dotnet/aspnetcore Build Log
769748 dotnet/aspnetcore Build Log

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 1 57
ViktorHofer commented 2 months ago

Hmm... This error showed up in https://github.com/dotnet/sdk/pull/36807 as well:

CSC : error AD0001: Analyzer 'Microsoft.NetCore.CSharp.Analyzers.Runtime.CSharpDetectPreviewFeatureAnalyzer' threw an exception of type 'System.NullReferenceException' with message 'Object reference not set to an instance of an object.'. [/vmr/src/efcore/src/EFCore/EFCore.csproj]

But I don't see it in the above table. Are we missing data?

akoeplinger commented 2 months ago

It's now in the table, probably just took a bit.

ViktorHofer commented 2 months ago

Interesting, I was under the impression that this only affects VMR builds but apparently this is more widespread.

akoeplinger commented 2 months ago

Yeah. I queried Kusto and we hit this 133 times over the last 60 days just in build logs.

ericstj commented 1 month ago

Should we add "BuildRetry": false to the known issue pattern here? I haven't used that feature but it may help.

akoeplinger commented 1 month ago

Yeah. Done.

jaredpar commented 1 month ago

Not sure what is going on but a significant chunk of the hits here seem to be false positives. Just spent 10 minutes digging through builds and can't see the error on 75% of them.

jaredpar commented 1 month ago

Many of the builds are attributed to dotnet-runtime when it's actually dotnet-runtime-perf. Also the results look like this ...

image

If the compiler is indeed throwing there it's very hard to dig through to the failure.

jaredpar commented 1 month ago

Cross posting the analysis from the linked bug on roslyn-analyzers

This is the stack of the NullReferenceException in at least one case:

System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Concurrent.ConcurrentDictionary`2.TryRemoveInternal(TKey key, TValue& value, Boolean matchValue, TValue oldValue)
at Microsoft.CodeQuality.Analyzers.Maintainability.AvoidUnusedPrivateFieldsAnalyzer.<>c__DisplayClass5_0.<Initialize>b__2(OperationAnalysisContext operationContext)
at Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor.ExecuteAndCatchIfThrows_NoLock[TArg](DiagnosticAnalyzer analyzer, Action`1 analyze, TArg argument, Nullable`1 info, CancellationToken cancellationToken)

That almost certainly represents this line in the roslyn analyzers code:

IFieldSymbol field = ((IFieldReferenceOperation)operationContext.Operation).Field;
if (field.DeclaredAccessibility == Accessibility.Private)
{
    referencedPrivateFields.TryAdd(field, default);
    // Error is here. 
    maybeUnreferencedPrivateFields.TryRemove(field, out _);
}

Both values here are non-null:

That seems like a runtime bug.

jaredpar commented 1 month ago

@akoeplinger, @ericstj, @ViktorHofer at least the variation I'm seeing above appears to be a runtime bug. Do we want to use this issue to track that or file a new one?

ericstj commented 1 month ago

I think a new one since the pattern observed here (single log statement with NRE) can't necessarily tie it to the single analyzer. Do you have a dump to help triage the runtime issue or is it just based on the callstack?

jaredpar commented 1 month ago

can't necessarily tie it to the single analyzer.

The log statements are always single line (for reasons I don't understand). But if you get the matching binlog you can usually see the full stack trace if you dig down into the messages.

Do you have a dump to help triage the runtime issue or is it just based on the callstack?

This is just based on call stacks. It manifests as an exception in the analyzer and by default compiler catches those and issues a warning.

The compiler can be configured to fail fast when this happens by setting the following msbuild property

<Features>$(Features);debug-analyzers</Features>

The failures are mostly coming on the runtime pipeline builds so you'd need to be setup to catch crash dumps on process FailFast. Doing that and we should get a dump in a few days.

jaredpar commented 1 month ago

Another variation of the NRE looks like this

Exception occurred with following context:
Compilation: Microsoft.CodeAnalysis.Razor.Compiler
IOperation: Invocation
SyntaxTree: /vmr/src/razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/Syntax/Generated/Syntax.xml.Syntax.Generated.cs
SyntaxNode: GetAnnotations() [InvocationExpressionSyntax]@[6078..6094) (208,30)-(208,46)
System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.GetOperationSymbol(IOperation operation)
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.OperationUsesPreviewFeatures(OperationAnalysisContext context, ConcurrentDictionary`2 requiresPreviewFeaturesSymbols, INamedTypeSymbol previewFeatureAttributeSymbol, ISymbol& referencedPreviewSymbol)
at Microsoft.NetCore.Analyzers.Runtime.DetectPreviewFeatureAnalyzer.<>c__DisplayClass33_0.<Initialize>b__1(OperationAnalysisContext context)
at Microsoft.CodeAnalysis.Diagnostics.AnalyzerExecutor.ExecuteAndCatchIfThrows_NoLock[TArg](DiagnosticAnalyzer analyzer, Action`1 analyze, TArg argument, Nullable`1 info, CancellationToken cancellationToken)

That is basically down to this block of code. That code on it's own (no inlining) is very hard to see a NRE on. Suspect that there is some amount of inlining going on here.

The invocation being analyzed here is the GetAnnotations() call on this line. That means we should be at this point in the code

        private static ISymbol? GetOperationSymbol(IOperation operation)
            => operation switch
            {
                // EXECUTION SHOULD BE HERE
                IInvocationOperation iOperation => iOperation.TargetMethod,
                IObjectCreationOperation cOperation => cOperation.Constructor,

Basically that should be an InvocationOperation and given that it's sealed and only impl of IInvocationOperation it is likely a candidate for inlining. At the same time the TargetMethod is an auto-implemented property and shouldn't ever null ref itself.

This a very puzzling one to understand. Seems like another case we'd need the debug-analyzers flag to get a dump and track down.

@333fred in case he can see any flaw in my analysis here.

333fred commented 1 month ago

That is basically down to this block of code. That code on it's own (no inlining) is very hard to see a NRE on.

Agreed. Looking at that block, there shouldn't be an opportunity for a null ref there, everything is null-safe. Even inlining in that location shouldn't result in a null-ref; looking through every property called by that method, directly or indirectly, they're auto-props, and the instances they're called on are checked for null beforehand. The only thing in that closure that isn't an extremely simple auto-prop is the call to arrayTypeSymbol.ElementType, but it's also extremely hard to see where that property would ever null-ref; further, it's not a good candidate for inlining, since there are several implementations. Agreed that we need more data to troubleshoot further.

jaredpar commented 1 month ago

Note: the results from the aspnetcore-quarantined-pr pipeline aren't actionable. They don't upload any binlogs when the build fails so we can't see what is happening.

@captainsafia who on aspnetcore owns this pipeline?

captainsafia commented 1 month ago

@captainsafia who on aspnetcore owns this pipeline?

AFAIK, the build isn't configured to produce binlogs at the moment (ref). @wtgodbe can help with making a change here to produce binlogs for further investigation.

jaredpar commented 1 month ago

Started compiling all of the results from looking at the failures into this gist

jaredpar commented 1 month ago

AFAIK, the build isn't configured to produce binlogs at the moment (ref). @wtgodbe can help with making a change here to produce binlogs for further investigation.

Note: the aspnetcore-ci pipeline uploads logs but it overwrites them on retry. That means if we hit any of these failures, then retry the logs of the failure are effectively deleted. That means we can't really get any info from any of the aspnetcore pipelines here.

jaredpar commented 1 month ago

Have a gist where I've summarized the diff errors I'm seeing. Dug into five of them.

There are 2-3 other analyzer that are producing AD0001 diagnostics that I haven't bothered to dig into.

I think the next steps are to get the owners of dotnet-unified-build and sdk-unified-build to enable compiler crash on analyzer exception and collection of dump logs so we can get a better idea what is going on here.

jaredpar commented 1 month ago

Can we please resolve this as a dupe of https://github.com/dotnet/runtime/issues/104123? That is what the runtime team's investigation lead them to. The fix is in PR and will be back ported to 9.0 P7

akoeplinger commented 1 month ago

the issue just got reopened, I also think we should keep this open for a bit so we have Build Analysis tracking

omajid commented 3 weeks ago

I am also running into this (and some variants) when building the .NET 9 Preview 7 using the VMR on a number of arm64 platforms using 9.0.100-preview.7.24380.2 as the build SDK.

Like I mentioned in https://github.com/dotnet/source-build/issues/4555 I am seeing a number of variants of errors:

jaredpar commented 3 weeks ago

This as a dupe of dotnet/runtime#104123. I don't have permissions to resolve the issue.

ellahathaway commented 2 weeks ago

https://github.com/dotnet/source-build/issues/4576 - I suspect that SB just encountered this error in one of our 9.0 builds