dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.2k stars 1.35k forks source link

MSBuild stops because of logger failure #4322

Open ViktorHofer opened 5 years ago

ViktorHofer commented 5 years ago

Build failed because of a logger failure in MSBuild. Happened in corefx official builds on RedHat6-x64: https://dnceng.visualstudio.com/internal/_build/results?buildId=159160&view=logs&jobId=7cbf968d-26f0-55c8-fa2c-bd900b311c34&taskId=b2aeb635-23e9-595d-82a3-8f619fe45749&lineStart=913&lineEnd=929&colStart=1&colEnd=30

A binary log file is available in the artifacts page but could contain secrets therefore I'm not uploading it here. Pls make sure to save it somewhere before retention policy deletes it.

Time Elapsed 00:03:34.47
MSBUILD : error MSB4017: The build stopped unexpectedly because of an unexpected logger failure.
Microsoft.Build.Exceptions.InternalLoggerException: The build stopped unexpectedly because of an unexpected logger failure. ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Collections.Generic.Dictionary`2.FindEntry(TKey key)
   at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
   at Microsoft.Build.Collections.CopyOnWritePropertyDictionary`1.get_Item(String name)
   at Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadataEscaped(String metadataName)
   at Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadata(String metadataName)
   at Microsoft.Build.Logging.BuildEventArgsWriter.Write(ITaskItem item)
   at Microsoft.Build.Logging.BuildEventArgsWriter.WriteItems(IEnumerable items)
   at Microsoft.Build.Logging.BinaryLogger.Write(BuildEventArgs e)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
   --- End of inner exception stack trace ---
   at Microsoft.Build.Exceptions.InternalLoggerException.Throw(Exception innerException, BuildEventArgs e, String messageResourceName, Boolean initializationException, String[] messageArgs)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
Build failed (exit code '1').
danmoseley commented 5 years ago

It should not be possible to make Dictionary ever throw NRE - unless the object is getting corrupted by concurrent access. My guess is that someone is writing to this dictionary at the same time (or had done so earlier) causing corruption.

danmoseley commented 5 years ago

If code inspection doesn't help it might possibly help to make a wrapper for the dictionary that deterministically throws if a write occurs concurrent with some other access. Then try to get a repro.

danmoseley commented 5 years ago

We hit again in our CI. @rainersigwald does this look familiar at all? Any thoughts about how this coudl happen?

Historically we have always said that if a logger fails, the build should fail. But now I am outside the tent I'm inclined to think that we should try to complete regardless, and log it as a warning. That would allow CI to work even if a logger is buggy. Most times the build output will be OK. Thoughts?

rainersigwald commented 5 years ago

No, I don't know what's going on here.

It's not a scoped change, but my first reaction to seeing that callstack is "we should replace CopyOnWritePropertyDictionary with ImmutableDictionary and be done with this". IIRC I thought about that once and ran up against https://github.com/dotnet/corefx/issues/1272, but there are ways around that. Maybe we should pick it up again.

I see where you're coming from on logger failures not failing the build, but I think logging is a critical part of build infrastructure. You're not willing to disable logging to work around this problem, so I think it'd be unacceptable to silently (in the limit, because where would we log the logger failure?) disable some or all logging.

ViktorHofer commented 5 years ago

Happened again: https://dev.azure.com/dnceng/public/_build/results?buildId=174161&view=logs.

If possible I would like to higher the priority on this as it seriously impacts corefx build's healthiness.

danmoseley commented 5 years ago

You're not willing to disable logging to work around this problem, so I think it'd be unacceptable to silently (in the limit, because where would we log the logger failure?) disable some or all logging.

Right, changing the policy in the engine feels wrong. So does permanently hacking around the problem. Also wrong would be continuing the build with a totally silent logger. But in this case I assume the logger state is not all messed up - it's just one message that's going to be incorrect.

Meantime it's important to us to increase our build success rates. If yo'ure not able to investigate in detail right now (which I fully understand!) then how about we put a try/catch temporarily in BinaryLogger.Write or BuildEventArgsWriter.Write. That serves the goal of getting our builds reliable until your team has time to investigate. We can do the PR if you think that's reasonable.

danmoseley commented 5 years ago

One other thought - in return we would be more than happy to try out private builds with extra logging if that would help you root cause it - so long as the build doesn't fail.

rainersigwald commented 5 years ago

Yes, unfortunately we're shorthanded and swamped at the moment, so we probably can't dig in to the level we'd like. I think putting the try/catch in the binary logger makes sense as a tactical step. If you can send the PR that'd be great.

danmoseley commented 5 years ago

@ViktorHofer would you be able to do that? I suggest including Debug.Fail in it, so that it continues to fail when run with debug bits (which real builds won't do)

ViktorHofer commented 5 years ago

Sure

ViktorHofer commented 5 years ago

@rainersigwald please assign the issue to me.

ViktorHofer commented 5 years ago

Happened again:

Microsoft.Build.Exceptions.InternalLoggerException: The build stopped unexpectedly because of an unexpected logger failure. ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at System.Collections.Generic.Dictionary`2.FindEntry(TKey key)
   at System.Collections.Generic.Dictionary`2.TryGetValue(TKey key, TValue& value)
   at Microsoft.Build.Collections.CopyOnWritePropertyDictionary`1.get_Item(String name)
   at Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadataEscaped(String metadataName)
   at Microsoft.Build.Execution.ProjectItemInstance.TaskItem.GetMetadata(String metadataName)
   at Microsoft.Build.Logging.BuildEventArgsWriter.Write(ITaskItem item)
   at Microsoft.Build.Logging.BuildEventArgsWriter.WriteItems(IEnumerable items)
   at Microsoft.Build.Logging.BinaryLogger.Write(BuildEventArgs e)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
   --- End of inner exception stack trace ---
   at Microsoft.Build.Exceptions.InternalLoggerException.Throw(Exception innerException, BuildEventArgs e, String messageResourceName, Boolean initializationException, String[] messageArgs)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
   at Microsoft.Build.BackEnd.Logging.EventSourceSink.RaiseAnyEvent(Object sender, BuildEventArgs buildEvent)
Build failed (exit code '1').
rainersigwald commented 5 years ago

I'm chasing the ImmutableDictionary thing, but don't let that stop you @ViktorHofer -- yours is a much safer change I'd like to land first.