dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.17k stars 1.34k forks source link

MsBuild node problem #3671

Open dazinator opened 5 years ago

dazinator commented 5 years ago

Steps to reproduce

Add following nuget feed: https://ci.appveyor.com/nuget/gitversion-8nigugxjftrw

Project file

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFrameworks>net461</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="GitVersionTask" Version="4.0.0-beta0015">
      <!--<PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>-->
    </PackageReference>
  </ItemGroup>
</Project>

Building from command line (msbuild, or dotnet cli) both work. Building from VS results in an exception in the output windows:

MSBUILD : error MSB4166: Child node "2" exited prematurely. Shutting down. Diagnostic information may be found in files in "C:\Users\Administrator\AppData\Local\Temp\" and will be named MSBuild_*.failure.txt. This location can be changed by setting the MSBUILDDEBUGPATH environment variable to a different directory. ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

The content of the MSBuild_*.failure.txt is:

UNHANDLED EXCEPTIONS FROM PROCESS 7924:

29/08/2018 14:21:13 System.ArgumentNullException: Value cannot be null. Parameter name: value at System.IO.BinaryWriter.Write(String value) at Microsoft.Build.Framework.LazyFormattedBuildEventArgs.WriteToStream(BinaryWriter writer) at Microsoft.Build.Framework.BuildWarningEventArgs.WriteToStream(BinaryWriter writer) at Microsoft.Build.Shared.LogMessagePacketBase.WriteToStream(INodePacketTranslator translator) at Microsoft.Build.Shared.LogMessagePacketBase.Translate(INodePacketTranslator translator) at Microsoft.Build.BackEnd.NodeEndpointOutOfProcBase.RunReadLoop(Stream localReadPipe, Stream localWritePipe, ConcurrentQueue`1 localPacketQueue, AutoResetEvent localPacketAvailable, AutoResetEvent localTerminatePacketPump)

Expected behavior

Was expecting building from VS to work, because both dotnet build and msbuild from command line works. Looking at the failure.txt file it's very hard for me to equate that to any problem with my own project, or the nuget package involved - it seems like an msbuild issue.

Actual behavior

Error shown above.

Environment data

msbuild /version output: 15.7.180.61344

OS info:

If applicable, version of the tool that invokes MSBuild (Visual Studio, dotnet CLI, etc):

Tried with VS 15.8.2, 15.8.1, 15.7.6

rainersigwald commented 5 years ago

I agree that no matter what bad thing is happening in the build, that error shouldn't be presented to the user.

Unfortunately, I can't seem to reproduce this using your instructions. My builds succeed. Has the possibly-bad package aged away or something? Can you still repro using just this project?

dazinator commented 5 years ago

Yeah I can still repro this using the above project file, and latest vs update 15.8.2. Another user reported it here with 15.8.1: https://github.com/GitTools/GitVersion/issues/1458 and also provided a repro via gist.

dazinator commented 5 years ago

Also the error doesnt show up on the errors pane in VS at all, you have to look at "output" pane to see it.

stazz commented 5 years ago

I found this issue when googling error message of CometaSolutions/UtilPack#30 , which seems to be about this same error. I do not know the details of how MSBuild internals work in multi-node building, but looking at disassembled code in C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Build.Framework.dll it seems that the only way string given to BinaryWriter can be null without checking for it is originalCultureName field of LazyFormattedBuildEventArgs. That field is assigned in constructor as this.originalCultureName = CultureInfo.CurrentCulture.Name;. Is it possible for the current culture name to be null in some special cases?

EDIT: Alternatively, another way of passing null to BinaryWriter seems to be by iterating arguments, some of which may return null via using Convert.ToString method.

rainersigwald commented 5 years ago

Ok, I think the GitVersion problem only arises if the project is not in a git repo. I usually create a repo to hold my repro projects, so that explains why I wasn't seeing it.

Debugging, @stazz was right; originalCultureName was null. I also don't understand how that could happen. Continuing to debug.

rainersigwald commented 5 years ago

This is a very weird one.

GitVersion 4.0.0-beta0015 redistributes Microsoft.Build.Framework.dll, version 15.1.548 (the VS 2017 RTM version). That's new, and it happens because GitVersion switched from referencing the v4 MSBuild assemblies in the GAC to referencing v15 assemblies via NuGet.

I didn't think that would be a problem, because the worker MSBuild.exe should already have Microsoft.Build.Framework loaded before running tasks, and from the right place/version. And it does. BUT . . .

When the task logs a warning, there are TWO copies of Framework loaded, one in the "NuGet assembly loader domain".

Microsoft.Build.Framework.dll   Microsoft.Build.Framework.dll   C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Build.Framework.dll    No  N/A Symbols loaded. C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Microsoft.Build.Framework.pdb    7   15.08.166.59604 <Unknown>   02AA0000-02ABC000   [9184] MSBuild.exe  [1] MSBuild.exe 
Microsoft.Build.Framework.dll   Microsoft.Build.Framework.dll   C:\Users\raines\.nuget\packages\gitversiontask\4.0.0-beta0015\build\net461\Microsoft.Build.Framework.dll    Yes N/A Cannot find or open the PDB file.       36  15.01.548.43366 2/14/2017 10:00 PM  072A0000-072B6000   [9184] MSBuild.exe  [2] NuGet assembly loader domain    

It looks like that's caused by UtilPack here: https://github.com/CometaSolutions/UtilPack/blob/e74fce0aec6f4c6978832571de20935c238163c5/Source/UtilPack.NuGet.AssemblyLoading/NuGetAssemblyResolver.cs#L361

I've debugged through a fair amount and I can't see why having this second Framework loaded in an appdomain matters. None of the objects I can see originate from the second assembly.

Deleting Microsoft.Build.Framework.dll from the GitVersion NuGet package causes the build to succeed. And fortunately, it isn't necessary to redistribute it with a task -- if the task is running, MSBuild's assemblies are available for use.

@dazinator I recommend you apply this patch in GitVersion:

diff --git a/src/GitVersionTask/GitVersionTask.csproj b/src/GitVersionTask/GitVersionTask.csproj
index 6da556ca..3500e527 100644
--- a/src/GitVersionTask/GitVersionTask.csproj
+++ b/src/GitVersionTask/GitVersionTask.csproj
@@ -79,8 +79,8 @@
     <PackageReference Include="YamlDotNet" Version="$(PackageVersion_YamlDotNet)">
         <PrivateAssets>All</PrivateAssets>   
     </PackageReference>
-    <PackageReference Include="Microsoft.Build.Utilities.Core" Version="15.1.548" />
-    <PackageReference Include="Microsoft.Build.Framework" Version="15.1.548" />
+    <PackageReference Include="Microsoft.Build.Utilities.Core" Version="15.1.548" ExcludeAssets="runtime" />
+    <PackageReference Include="Microsoft.Build.Framework" Version="15.1.548" ExcludeAssets="runtime" />
     <PackageReference Include="UtilPack.NuGet.MSBuild" Version="$(PackageVersion_UtilPackNuGetMSBuild)" />
   </ItemGroup> 

That tells NuGet that you only need the Microsoft.Build* references at build time, and they should be gotten from elsewhere at runtime. That should cause the output package to not have a copy of MSBuild assemblies, and resolve the symptoms of this problem.

For a long-term fix I don't know what to do. We could default to en-US at serialize time if the culture is null, but that feels like spackling over a problem I don't fully understand.

dazinator commented 5 years ago

@rainersigwald thank you for investigating and for the workaround. I will try the suggested patch and report back.

stazz commented 5 years ago

@rainersigwald Very nice work at catching the real issue! If I remember correctly, the desktop AppDomain will not trigger its AssemblyResolve event if it finds the corresponding assembly within the same directory as the assembly requesting it. If that is true, then the only way to fix is indeed to remove MSBuild assemblies from the directory where GitVersion task DLL files are being loaded. On .NET Core, the AssemblyLoadContext behaves differently, so that would explain why this happens only on desktop.

The reason why none of the MSBuild objects originate from the UtilPack AppDomain is probably because the GitVersion task only uses MSBuild to reference types. The reason why originalCultureName was null is a mystery though. How about setting it to empty string if the CultureInfo.CurrentCulture.Name is null, thus at least avoiding a hard crash like this? Would that cause the same hard-crashing issues on deserializing end, or would things just silently be ignored/default values used?

rainersigwald commented 5 years ago

I haven't actually tried it, but my concern is serializing as empty-string if it's null might result in an exception at deserialize time. That's why I was contemplating falling back to something like en-US which might be wrong but should be valid.

stazz commented 5 years ago

@rainersigwald What is your opinion about which scenario is better for the users of MSBuild - the hard crash when serializing, or (hard?) crash when deserializing? I am all about detecting and reacting to errors as early as possible, but it seems harsh to render building completely inoperable for the end user because of this. If the crashing on deserializing end won't halt the whole build, then I think empty string would be at least one possible option to fix this. The users might still end up getting error messages, but at least their builds would work.

The root cause of why CultureInfo.CurrentCulture.Name is null I guess lies somewhere within commits made to .NET Desktop framework, or MSVS between versions 15.8.1 (if that was the last working version for this issue) and 15.8.2.