dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.23k stars 4.72k forks source link

Unable to build on Ubuntu 18.04 #29645

Closed vcsjones closed 4 years ago

vcsjones commented 5 years ago

I'm struggling to build on Ubuntu 18.04 recently. Things get going for a few minutes and then:

MSBUILD : error MSB1025: An internal failure occurred while running MSBuild.
System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at Microsoft.Build.Execution.BuildManager.EndBuild()
   at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, Boolean isolateProjects, Boolean graphBuild, String[] inputResultsCaches, String outputResultsCache)

I'm not sure if this is related to dotnet/runtime#29629 or not, but on the surface it seems like two different issues.

This error always occurs after this in the build log:

Verifying closure of runtime.linux-x64.Microsoft.Private.CoreFx.NETCoreApp runtime assemblies
Verifying no duplicate types in runtime.linux-x64.Microsoft.Private.CoreFx.NETCoreApp runtime assemblies
Microsoft.Private.CoreFx.NETCoreApp -> /code/personal/corefx/artifacts/packages/Debug/specs/runtime.linux-x64.Microsoft.Private.CoreFx.NETCoreApp.nuspec
runtime.native.System.IO.Ports -> /code/personal/corefx/artifacts/packages/Debug/specs/runtime.native.System.IO.Ports.nuspec
runtime.native.System.IO.Ports -> /code/personal/corefx/artifacts/packages/Debug/specs/runtime.linux-x64.runtime.native.System.IO.Ports.nuspec
💥
ericstj commented 5 years ago

Might be an MSBuild regression. /cc @rainersigwald

Looks like MSBuild itself (or a logger) hit an ArgumentNullException then MSBuild blew away the stack when reporting it: https://github.com/microsoft/msbuild/blob/a7ec0717645fbf928048ace681e817c049a03762/src/Build/BackEnd/BuildManager/BuildManager.cs#L415-L416

Does it reproduce every time? Maybe you can capture a binlog or a diag log to get the original exception/callstack if it happens to log it.

wfurt commented 5 years ago

dotnet/runtime#29629 is failing to restore packages and I think it is quite different. How exactly do you run your build @vcsjones? I'm using 18.04 as my Linux build server and I did not see this.

vcsjones commented 5 years ago

@wfurt a simple ./build.sh at the corefx root.

Does it reproduce every time?

Yes. I'll grab a binlog and see if I can find out what's going on.

wfurt commented 5 years ago

yes, binlog would be good. I assume it is not single core machine, right? BTW You may try export UseSharedCompilation=false It is not a fix but it helped me in the past.

vcsjones commented 5 years ago

@wfurt thanks for that. UseSharedCompilation=false did get it to build successfully (and it built much faster, about 10x).

I assume it is not single core machine, right?

Correct, it's a 16C/32T box.

binlog would be good.

I did grab a binlog of a failed build but didn't see any smoking gun of what the failure was. Everything seemed OK and then suddenly "build failed" right at the end. I can upload it somewhere if anyone cares to look at it provided I can get it over safely given that it appears to potentially contain sensitive information.

wfurt commented 5 years ago

I'm glad it worked @vcsjones but the outcome is surprising. Shared compilation should be faster we we keep process running and JITed instead of starting new one over and over. Since your machine is quite powerful it really may be some race condition in msbuild.

There was some variables how to enable internal msbuild logging but I cannot find it.

wtgodbe commented 5 years ago

My hacky workaround for getting a verbose msbuild.log file is manually editing https://github.com/dotnet/corefx/blob/73066ea252de240b0c3bcb79a0d9a1cfbbe56952/eng/common/build.sh#L178 to:

bl="/flp:v=diag"

@wfurt is the UseSharedCompilation workaround documented somewhere/do we have an issue for it?

ericstj commented 5 years ago

I took a closer look at this codepath and I don't think we'll find what we want in the logs. See: https://github.com/microsoft/msbuild/blob/a7ec0717645fbf928048ace681e817c049a03762/src/Build/BackEnd/BuildManager/BuildManager.cs#L2120-L2124. So this seems like MSBuild hit an exception in a logger or in the IPC between nodes. I think the only way we'd get more info here is a debugger or added instrumentation to MSBuild. Simply attaching and breaking on throw of ArgumentNullException should get it.

Also we should fix up that MSBuild code to throw with ExceptionDispatchInfo to preserve the stack. That may be the best next step.

vcsjones commented 5 years ago

So this seems like MSBuild hit an exception in a logger or in the IPC between nodes. I think the only way we'd get more info here is a debugger or added instrumentation to MSBuild.

Since I can only reproduce this on Linux and not Windows this might be a bit challenging for me. :sweat_smile:

I'll see what I can do here.

vcsjones commented 5 years ago

Some additional evidence that it might be a race-condition issue.

  1. I can reproduce this using only ./build.sh --restore. So perhaps UseSharedCompilation was a red herring.

  2. The issue goes away if I do taskset -a 1 ./build.sh --restore which is a CPU mask of 0b00000001.

  3. The issue comes back if I do taskset -a 3 ./build.sh --restore which is a CPU mask of 0b00000011.

It seems that as soon as I get > 1 core involved I run in to build issues.

The good news is that I can reproduce it with build.sh --restore which is a lot less involved.

This does not reproduce for me with any shipped version of MSBuild.

rainersigwald commented 5 years ago

Can you run with these environment variables?

MSBUILDDEBUGCOMM=1
MSBUILDDEBUGPATH={full path to folder}

where {full path to folder} is an existing but empty folder.

That will produce additional logging around node communication. Hopefully that gives us a clue.

I'd also be interested in your results doing something like @wtgodbe suggested, which would point a finger at the binary logger rather than node communication as the problem.

Also, this error is bad and I think we could do better: microsoft/msbuild#4413.

vcsjones commented 5 years ago

@rainersigwald

Can you run with these environment variables?

Sure. Thank you for taking the time to help me troubleshoot.

Using ./build.sh --restore --build with no CPU affinity set, it produced a single log file with 1,844 of lines like this:

(TID 34) 636949991390999511 + 3767.3352ms: Starting to acquire a new or existing node to establish node ID 1671...

Later after the build appeared "done" more log files like this appeared:

OfProc Endpoint Packet Pump (TID 36) 636949996316056892 +  900000.2ms: Connection timed out waiting a host to contact us.  Exiting comm thread.
OutOfProc Endpoint Packet Pump (TID 36) 636949996316057989 +    0.1097ms: Changing link status from Inactive to ConnectionFailed
 (TID 1) 636949996316063056 +    0.5067ms: Changing link status from ConnectionFailed to Inactive

I don't see any lingering processes. Anyway, these logs seem harmless to upload so here is the contents of my log directory. build-logs.zip

I as tried export MSBUILDDISABLENODEREUSE=1 and it did not change the behavior. Same error at the end of the build, same logs.

I'd also be interested in your results doing something like @wtgodbe suggested

I made the changes as-suggested, and the build did not produce any different results or logging as far as I could tell. It's likely I am misunderstanding the intention of the change though.

tmds commented 5 years ago

I'm seeing this issue also.

rainersigwald commented 5 years ago

Interesting. We emit that message just before we spawn a child process to act as a worker node. That explains why your build is slow: it became single-threaded, and is slow even relative to that, since it's also looping trying to create child processes.

However, looking at the code in question:

https://github.com/microsoft/msbuild/blob/c70d6ded175c0a7da632018b856daff9f9cd392e/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProc.cs#L123-L124

Inside GetNode

https://github.com/microsoft/msbuild/blob/c70d6ded175c0a7da632018b856daff9f9cd392e/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L141

it looks like there should be additional logging before I see anything that seems likely to fail:

https://github.com/microsoft/msbuild/blob/c70d6ded175c0a7da632018b856daff9f9cd392e/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L172

or

https://github.com/microsoft/msbuild/blob/c70d6ded175c0a7da632018b856daff9f9cd392e/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L206

My first thought is a hang in GetPossibleRunningNodes, but that's only called when node reuse is enabled, and you said that disabling it didn't result in any differences. Maybe something is throwing immediately in either case, though.

Trying to repro on an 18.04 machine of my own, since I'm running out of debugging ideas that aren't incredibly hard to write down.

tmds commented 5 years ago

@rainersigwald an observation: if I close vscode, I get much more dotnet processes and my corefx build becomes faster.

rainersigwald commented 5 years ago
./build.sh --restore --build

Seems to complete just fine for me on fully updated Ubuntu 18.04 + f6aed7b17. Nothing untoward in the logs

if I close vscode, I get much more dotnet processes and my corefx build becomes faster.

Interesting. I'm sshed into this machine with only a single login, so that might be an interesting difference. Can you share the communcation logs from the scenario where vscode is open?

vcsjones commented 5 years ago

if I close vscode, I get much more dotnet processes and my corefx build becomes faster.

For what it's worth I still have the issue with no VSCode open or anything else going on in the box. If I restart the box then run that command, it reproduces for me.

@tmds Does the error go away for you if you lock to a single core? taskset -a 1 ./build.sh?

rainersigwald commented 5 years ago

I currently think @vcsjones and @tmds are seeing separate problems.

@vcsjones let's try to scope down the repro. Can you copy this into a file named test.proj

<Project>
 <ItemGroup>
  <ProjectReference Include="$(MSBuildThisFileFullPath)" AdditionalProperties="P=a" />
  <ProjectReference Include="$(MSBuildThisFileFullPath)" AdditionalProperties="P=b" />
 </ItemGroup>

 <Target Name="Build">
  <MSBuild Projects="@(ProjectReference)" BuildInParallel="true" Targets="Inner" />
 </Target>

 <Target Name="Inner">
  <Message Importance="High" Text="Inner for $(P)" />
 </Target>
</Project>

and run .dotnet/dotnet build test.proj to build it using the private CLI that corefx downloads and uses? Please run that under the logging environment like you did before and upload those logs (and the console output).

That's a tiny project that should cause one worker node to be spawned and used. If that's broken, it's at least a smaller problem to poke at.

tmds commented 5 years ago

For the simple project, without vscode running I get:

[tmds@rex testproj]$ ~/repos/corefx/.dotnet/dotnet build test.proj
Microsoft (R) Build Engine version 16.1.67-preview+g13843078ee for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Inner for a
  Inner for b

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:00.36

log:

 (TID 10) 636951877759669195 +    0.5351ms: Starting to acquire a new or existing node to establish node ID 2...
 (TID 10) 636951877759805268 +   13.6073ms: Attempting to connect to each existing msbuild.exe process in turn to establish node 2...
 (TID 10) 636951877759826575 +    2.1307ms: Attempting connect to PID 9843 with pipe MSBuild9843 with timeout 0 ms
 (TID 10) 636951877759937305 +    11.073ms: Writing handshake to pipe MSBuild9843
 (TID 10) 636951877759965521 +    2.8216ms: Reading handshake from pipe MSBuild9843
 (TID 10) 636951877759989068 +    2.3547ms: Successfully connected to pipe MSBuild9843...!
 (TID 10) 636951877759989569 +    0.0501ms: Successfully connected to existed node 2 which is PID 9843

With vscode running:

[tmds@rex testproj]$ ~/repos/corefx/.dotnet/dotnet build test.proj
Microsoft (R) Build Engine version 16.1.67-preview+g13843078ee for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Inner for a
  Inner for b

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:00.36
MSBUILD : error MSB1025: An internal failure occurred while running MSBuild.
System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at Microsoft.Build.Execution.BuildManager.EndBuild()
   at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, Boolean isolateProjects, Boolean graphBuild, String[] inputResultsCaches, String outputResultsCache)

MSBUILD : error MSB1025: An internal failure occurred while running MSBuild.
System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, Boolean isolateProjects, Boolean graphBuild, String[] inputResultsCaches, String outputResultsCache)
   at Microsoft.Build.CommandLine.MSBuildApp.Execute(String[] commandLine)

Unhandled Exception: System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, Boolean isolateProjects, Boolean graphBuild, String[] inputResultsCaches, String outputResultsCache)
   at Microsoft.Build.CommandLine.MSBuildApp.Execute(String[] commandLine)
   at Microsoft.Build.CommandLine.MSBuildApp.Main(String[] args)

log:

 (TID 9) 636951877959420639 +    0.6439ms: Starting to acquire a new or existing node to establish node ID 2...
danmoseley commented 5 years ago

With VSCode running, I'm guessing it makes MSBuild child processes. What copy of microsoft.build.dll do they load - does it have a commit SHA in it I wonder? Also, without VSCode running, what copy of microsoft.build.dll does that load. My guess (but go with what @rainersigwald says) is that the parent and child are using incompatibly different microsoft.build.dll's. The handshake is supposed to prevent that but maybe it's not.

wfurt commented 5 years ago

the question is if the test project is build from VSCode of that is different process group. If it is the second, it should not have functional impact like it does, right?

rainersigwald commented 5 years ago

Ok, maybe it's not a distinct issue!

@danmosemsft I don't think it's that, because we're not even seeing the "attempting to connect to PID xxx" log string.

Can someone seeing this problem run this tiny program that enumerates processes like MSBuild does?

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

namespace ProcessEnumerator
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Getting current process");
            var currentProcess = Process.GetCurrentProcess();

            System.Console.WriteLine("Getting process name");
            var currentHost = currentProcess.MainModule.FileName;

            var expectedProcessName = Path.GetFileNameWithoutExtension(currentHost);

            Console.WriteLine($"Getting list of processes like \"{expectedProcessName}\"");
            List<Process> nodeProcesses = new List<Process>(Process.GetProcessesByName(expectedProcessName));

            foreach (var p in nodeProcesses)
            {
                System.Console.WriteLine($"Process {p.Id}");
            }

            expectedProcessName = "dotnet";

            Console.WriteLine($"Getting list of processes like \"{expectedProcessName}\"");
            nodeProcesses = new List<Process>(Process.GetProcessesByName(expectedProcessName));

            foreach (var p in nodeProcesses)
            {
                System.Console.WriteLine($"Process {p.Id}");
            }

        }
    }
}

MSBuild enumerates processes to attempt to find preexisting idle worker nodes. That may be going wrong here (though how is not clear).

vcsjones commented 5 years ago

OK, I might be seeing what @tmds is. Here's what I did.

  1. I ran the small .proj file. I got the same error from the initial report. I did not have VSCode open.
  2. I restarted my computer. Without opening anything else, I hopped in the Terminal and ran the build. It built without errors!
  3. I opened VSCode.
  4. Ran the exact same command and immediately started getting errors. I closed VSCode, and the error went away again.

I never made the connection to VS Code. I was using it to look at the log files the whole time so that's why I was getting it all the time.

Here are my logs for when it working successfully (VSCode is closed):

 (TID 6) 636951930017107464 +    0.5992ms: Starting to acquire a new or existing node to establish node ID 2...
 (TID 6) 636951930017263153 +   15.5689ms: Attempting to connect to each existing msbuild.exe process in turn to establish node 2...
 (TID 6) 636951930017286471 +    2.3318ms: Attempting connect to PID 4154 with pipe MSBuild4154 with timeout 0 ms
 (TID 6) 636951930017377876 +    9.1405ms: Writing handshake to pipe MSBuild4154
 (TID 6) 636951930017401710 +    2.3834ms: Reading handshake from pipe MSBuild4154
 (TID 6) 636951930017419031 +    1.7321ms: Successfully connected to pipe MSBuild4154...!
 (TID 6) 636951930017419355 +    0.0324ms: Successfully connected to existed node 2 which is PID 4154
vcsjones commented 5 years ago

The interesting thing for me is I was able to reproduce it with VS Code closed - I assume some of the nodes that are being problematic needed to idle out? It's clearly tied to VS Code for me, somehow.

vcsjones commented 5 years ago

@rainersigwald

When I have VS Code closed:

dotnet /code/personal/scratch/bin/Debug/netcoreapp2.2/scratch.dll
Getting current process
Getting process name
Getting list of processes like "dotnet"
Process 4154
Process 6266
Process 6407
Getting list of processes like "dotnet"
Process 4154
Process 6266
Process 6407

and when I have it open (and the small project is failing to build):

dotnet /code/personal/scratch/bin/Debug/netcoreapp2.2/scratch.dll
Getting current process
Getting process name
Getting list of processes like "dotnet"
Process 4154
Process 6266
Process 7323
Getting list of processes like "dotnet"
Process 4154
Process 6266
Process 7323
rainersigwald commented 5 years ago

Thanks, folks! With a private of https://github.com/microsoft/msbuild/pull/4417, I get a better stack:

MSBUILD : error MSB1025: An internal failure occurred while running MSBuild.
System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1.Return(T[] array, Boolean clearArray)
   at System.Diagnostics.Process.GetProcessName(Int32 processId)
   at System.Diagnostics.Process.GetProcessesByName(String processName, String machineName)
   at System.Diagnostics.Process.GetProcessesByName(String processName)
   at Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.GetPossibleRunningNodes(String msbuildLocation) in /home/raines/src/msbuild/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs:line 263
   at Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.GetNode(String msbuildLocation, String commandLineArgs, Int32 nodeId, INodePacketFactory factory, Int64 hostHandshake, Int64 clientHandshake, NodeContextTerminateDelegate terminateNode) in /home/raines/src/msbuild/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs:line 168
   at Microsoft.Build.BackEnd.NodeProviderOutOfProc.CreateNode(Int32 nodeId, INodePacketFactory factory, NodeConfiguration configuration) in /home/raines/src/msbuild/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProc.cs:line 124
   at Microsoft.Build.BackEnd.NodeManager.AttemptCreateNode(INodeProvider nodeProvider, NodeConfiguration nodeConfiguration) in /home/raines/src/msbuild/src/Build/BackEnd/Components/Communications/NodeManager.cs:line 357
   at Microsoft.Build.BackEnd.NodeManager.CreateNode(NodeConfiguration configuration, NodeAffinity nodeAffinity) in /home/raines/src/msbuild/src/Build/BackEnd/Components/Communications/NodeManager.cs:line 113
   at Microsoft.Build.Execution.BuildManager.PerformSchedulingActions(IEnumerable`1 responses) in /home/raines/src/msbuild/src/Build/BackEnd/BuildManager/BuildManager.cs:line 1925
   at Microsoft.Build.Execution.BuildManager.ProcessPacket(Int32 node, INodePacket packet) in /home/raines/src/msbuild/src/Build/BackEnd/BuildManager/BuildManager.cs:line 1221
   at Microsoft.Build.Execution.BuildManager.ProcessWorkQueue(Action action) in /home/raines/src/msbuild/src/Build/BackEnd/BuildManager/BuildManager.cs:line 1174
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Build.Execution.BuildManager.EndBuild() in /home/raines/src/msbuild/src/Build/BackEnd/BuildManager/BuildManager.cs:line 778
   at Microsoft.Build.CommandLine.MSBuildApp.BuildProject(String projectFile, String[] targets, String toolsVersion, Dictionary`2 globalProperties, Dictionary`2 restoreProperties, ILogger[] loggers, LoggerVerbosity verbosity, DistributedLoggerRecord[] distributedLoggerRecords, Int32 cpuCount, Boolean enableNodeReuse, TextWriter preprocessWriter, Boolean detailedSummary, ISet`1 warningsAsErrors, ISet`1 warningsAsMessages, Boolean enableRestore, ProfilerLogger profilerLogger, Boolean enableProfiler, Boolean interactive, Boolean isolateProjects, Boolean graphBuild, String[] inputResultsCaches, String outputResultsCache) in /home/raines/src/msbuild/src/MSBuild/XMake.cs:line 1211

That looks like a corefx issue to me at first glance.

rainersigwald commented 5 years ago

With VS Code 1.34.0 open, I get this exception from the program above (no MSBuild):

Getting current process
Getting process name
Getting list of processes like "enumerate"

Unhandled Exception: System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1.Return(T[] array, Boolean clearArray)
   at System.Diagnostics.Process.GetProcessName(Int32 processId)
   at System.Diagnostics.Process.GetProcessesByName(String processName, String machineName)
   at System.Diagnostics.Process.GetProcessesByName(String processName)
   at ProcessEnumerator.Program.Main(String[] args) in /home/raines/src/corefx/enumerate/Program.cs:line 21

These processes are children of code:

$ pstree -apT 32502
code,32502
  ├─code,32504
  │   ├─code,491
  │   └─code,5628
  │       ├─code,5662 --nolazy --inspect=38583 /usr/share/code/resources/app/out/bootstrap-fork --type=extensionHost
  │       │   └─mono,5799 --assembly-loader=strict...
  │       ├─code,5664 /usr/share/code/resources/app/out/bootstrap-fork --type=watcherService
  │       └─code,5712 /usr/share/code/resources/app/out/bootstrap-fork --type=searchService
  └─code,32533

I tried killing 5799 and 5662, and it still crashed. But closing code made it complete.

vcsjones commented 5 years ago

I get this exception from the program above (no MSBuild):

Oh I misunderstood, I did not run that with corefx's own copy of msbuild. Doing that I get the same problem:

Getting current process
Getting process name
Getting list of processes like "scratch"

Unhandled Exception: System.ArgumentNullException: Value cannot be null.
Parameter name: array
   at System.Buffers.TlsOverPerCoreLockedStacksArrayPool`1.Return(T[] array, Boolean clearArray)
   at System.Diagnostics.Process.GetProcessName(Int32 processId)
   at System.Diagnostics.Process.GetProcessesByName(String processName, String machineName)
   at System.Diagnostics.Process.GetProcessesByName(String processName)
   at ProcessEnumerator.Program.Main(String[] args) in /code/personal/scratch/Program.cs:line 21
ericstj commented 5 years ago

This appears to be fixed with https://github.com/dotnet/corefx/commit/1b6f45ca261467baea62ef577f8b8a7c6cf3b96c#diff-eb6c3669a3203e9c82dfb79eb45a93e1R285. /cc @tmds @wtgodbe

If we pick up a new SDK with that fix in the shared runtime we should avoid this issue.

danmoseley commented 5 years ago

Nice job everyone helping root cause this. So the fix is already in Preview 6 then (plus the MSBuild change in next preview).

ericstj commented 5 years ago

yep. I want to use this to track an update to global.json so that we consume that update for our local CLI.

danmoseley commented 5 years ago

Maybe ArrayPool.Return should just silently accept null, this pattern seems fairly common..

ericstj commented 5 years ago

@danmosemsft that makes me nervous. Seems like someone might accidentally leak a rented array, which I believe would have happened here. Maybe take that discussion to a new issue if you'd like folks to follow up.

tmds commented 5 years ago

In this case the null was pointing out a bug: not returning the proper rented array. So I'm also in favor of keeping this as an invalid value.

tmds commented 5 years ago

This appears to be fixed with 1b6f45c#diff-eb6c3669a3203e9c82dfb79eb45a93e1R285. /cc @tmds @wtgodbe

:sweat_smile: regressed here: https://github.com/dotnet/corefx/commit/4214b09864bea0416f17727ffa39a55d950035d1 fixed here: https://github.com/dotnet/corefx/commit/1b6f45ca261467baea62ef577f8b8a7c6cf3b96c

Some more info on the relationship with vscode running: the issue occurs when there is a process with a command line that is long (strlen(arg[0]) + strlen(arg[1]) + 2 > 512).

vcsjones commented 5 years ago

Woohoo! Thanks a ton to everyone helping me figure out why this wasn't working.