NuGet / Home

Repo for NuGet Client issues
Other
1.49k stars 250 forks source link

`RestoreCollectorLogger` logging into `EvaluationLoggingContext` after evaluation finished #12744

Open KirillOsenkov opened 1 year ago

KirillOsenkov commented 1 year ago

NuGet Product Used

MSBuild.exe

Product Version

msbuild.exe 17.6.3+07e294721 for .NET Framework

Worked before?

not a regression

Impact

Other

Repro Steps & Context

During evaluating MSBuild projects in the IDE we see exceptions under debugger where RestoreCollectionLogger is attempting to log into an EvaluationLoggingContext which is no longer valid.

The exception is:

Microsoft.Build.Framework.InternalErrorException: MSB0001: Internal MSBuild Error: must be valid
   at void Microsoft.Build.Shared.ErrorUtilities.ThrowInternalError(string message, Exception innerException, params object[] args)
   at void Microsoft.Build.BackEnd.Logging.LoggingContext.LogCommentFromText(MessageImportance importance, string message)
   at void Microsoft.Build.BackEnd.SdkResolution.SdkLogger.LogMessage(string message, MessageImportance messageImportance)
   at void Microsoft.Build.NuGetSdkResolver.NuGetSdkLogger.Log(LogLevel level, string data) x 2
   at void NuGet.Commands.RestoreCollectorLogger.Log(IRestoreLogMessage message) x 2
   at void NuGet.Common.LoggerBase.Log(LogLevel level, string data)
   at void NuGet.Common.LoggerBase.LogMinimal(string data)
   at async Task<SortedDictionary<NuGetVersion, PackageInfo>> NuGet.Protocol.HttpFileSystemBasedFindPackageByIdResource.FindPackagesByIdAsync(string id, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)
   at async Task<SortedDictionary<NuGetVersion, PackageInfo>> NuGet.Protocol.HttpFileSystemBasedFindPackageByIdResource.EnsurePackagesAsync(string id, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)
   at async Task<bool> NuGet.Protocol.HttpFileSystemBasedFindPackageByIdResource.DoesPackageExistAsync(string id, NuGetVersion version, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)
   at async Task<LibraryIdentity> NuGet.Commands.SourceRepositoryDependencyProvider.FindLibraryCoreAsync(LibraryRange libraryRange, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)
   at async Task<LibraryIdentity> NuGet.Commands.SourceRepositoryDependencyProvider.FindLibraryAsync(LibraryRange libraryRange, NuGetFramework targetFramework, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)+(?) => { }
   at async Task<LibraryIdentity> NuGet.Commands.SourceRepositoryDependencyProvider.FindLibraryAsync(LibraryRange libraryRange, NuGetFramework targetFramework, SourceCacheContext cacheContext, ILogger logger, CancellationToken cancellationToken)

This line invalidates the logging context after evaluation finishes for a project: https://github.com/dotnet/msbuild/blob/4598629a2493ed6b366ab00d595ecd53a3164b60/src/Build/BackEnd/Components/Logging/EvaluationLoggingContext.cs#L42

The chain of loggers is as follows.

Evaluation -> ResolveSdk(EvaluationLoggingContext) -> new SdkLogger(EvaluationLoggingContext) -> new SdkResolverContext(SdkLogger) -> NuGetSdkResolver.Resolve() -> new NuGetSdkLogger(SdkResolverContext.SdkLogger)

  1. Evaluation calls into ResolveSdk to resolve an SDK such as Arcade, it passes the EvaluationLoggingContext which will be later invalidated after the project evaluation finishes
  2. ResolveSdk calls new SdkLogger(EvaluationLoggingContext): https://github.com/dotnet/msbuild/blob/4598629a2493ed6b366ab00d595ecd53a3164b60/src/Build/BackEnd/Components/SdkResolution/SdkLogger.cs#L22
  3. new SdkResolverContext is created and passed the SdkLogger
  4. NuGetSdkResolver.Resolve() is called and passed the SdkResolverContext
  5. new NuGetSdkLogger is created and passed the SdkResolverContext.SdkLogger: https://github.com/NuGet/NuGet.Client/blob/b3c4e8094bbb924408180d700ae5b3d37713587c/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkLogger.cs#L42 ... (at some point later, it's unclear when or how) ...

RestoreCollectorLogger is passed the innerLogger which is the NuGetSdkLogger, which internally points to the SdkLogger, which in turn uses the EvaluationLoggingContext

I'm guessing after some retries or network latencies the RestoreCommand calls RestoreCollectorLogger.Log, which calls _innerLogger.Log here, which in turn throws: https://github.com/NuGet/NuGet.Client/blob/b3c4e8094bbb924408180d700ae5b3d37713587c/src/NuGet.Core/NuGet.Commands/RestoreCommand/Logging/RestoreCollectorLogger.cs#L149

After debugging for a while, I stopped being able to reproduce this, presumably because of the caches? Afraid to clear the caches now because I have some unrelated valuable context there, but hopefully by reasoning you can figure out under what circumstances the RestoreCollectorLogger can continue logging after the project evaluation that triggered the Sdk resolution completes.

Verbose Logs

No response

KirillOsenkov commented 1 year ago

@jeffkl @dsplaisted @forgind

KirillOsenkov commented 1 year ago

My hunch is that some code path during NuGet SDK resolution "forks" and goes asynchronous, while the rest of MSBuild evaluation resumes, assuming that's done, and eventually evaluation completes and invalidates the logging context, while that code path that throws is still in flight.

KirillOsenkov commented 1 year ago

Potentially related? https://github.com/dotnet/msbuild/issues/8639

KirillOsenkov commented 1 year ago

I dug in some more, and I'm seeing RestoreRunner.RunWithoutCommit calling new RestoreCommand(RestoreRequest), which in turn calls new RestoreCollectorLogger(RestoreRequest.Log). RestoreRequest.Log is the NuGetSdkLogger in question.

KirillOsenkov commented 1 year ago

On the MSBuild side, the stack that starts in evaluation and calls into SdkResolver is:

 Microsoft.Build.NuGetSdkResolverNuGetSdkLogger..ctor Line 42
 Microsoft.Build.NuGetSdkResolverNuGetSdkResolver.NuGetAbstraction.GetSdkResult Line 140
 Microsoft.Build.NuGetSdkResolverNuGetSdkResolver.Resolve Line 95
 Microsoft.BuildSdkResolverService.TryResolveSdkUsingSpecifiedResolvers Line 325
 Microsoft.BuildSdkResolverService.ResolveSdkUsingResolversWithPatternsFirst Line 204
 Microsoft.BuildSdkResolverService.ResolveSdk Line 117
 Microsoft.BuildCachingSdkResolverService.ResolveSdk Line 67
 mscorlibLazy`1.CreateValue
 mscorlibLazy`1.LazyInitValue
 mscorlibLazy`1.get_Value
 Microsoft.BuildCachingSdkResolverService.ResolveSdk Line 71
 Microsoft.BuildEvaluator`4.ExpandAndLoadImportsFromUnescapedImportExpressionConditioned Line 1850

This is what's blocking on the result task:

 mscorlibTask`1.get_Result
 Microsoft.Build.NuGetSdkResolverNuGetSdkResolver.NuGetAbstraction.GetSdkResult
 Microsoft.Build.NuGetSdkResolverNuGetSdkResolver.Resolve
 Microsoft.BuildSdkResolverService.TryResolveSdkUsingSpecifiedResolvers Line 277
 Microsoft.BuildSdkResolverService.ResolveSdkUsingResolversWithPatternsFirst Line 184
 Microsoft.BuildSdkResolverService.ResolveSdk Line 117
 Microsoft.BuildCachingSdkResolverService.ResolveSdk Line 68
jebriede commented 1 year ago

Team triage: @jeffkl could you take a look at this bug and evaluate the priority?

KirillOsenkov commented 1 year ago

@jeffkl the repro is a bit involved, and I’m on vacation this week, but I’d be happy to pair with you on this next week or after. It’s not blocking.