dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.66k stars 1.06k forks source link

cli tools unstable on our VSTS build agent #6770

Closed mmisztal1980 closed 4 years ago

mmisztal1980 commented 8 years ago

Steps to reproduce

We run dotnet build or dotnet publish within our project folder. This has started to occur a short while ago, we are unable to determine the root cause.

Our build agent is a Windows Server 2016 TP5 VM.

project.json:

{
  "dependencies": {
    "Microsoft.AspNetCore.Mvc": "1.0.0",
    "Microsoft.AspNetCore.Server.Kestrel": "1.0.0",
    "Microsoft.AspNetCore.Server.IISIntegration": "1.0.0",
    "Microsoft.EntityFrameworkCore": "1.0.0",
    "Microsoft.EntityFrameworkCore.SqlServer": "1.0.0",
    "Microsoft.Extensions.Configuration.CommandLine": "1.0.0",
    "Microsoft.Extensions.Configuration.EnvironmentVariables": "1.0.0",
    "Microsoft.Extensions.Options.ConfigurationExtensions": "1.0.0",
    "Microsoft.Extensions.Configuration.FileExtensions": "1.0.0",
    "Microsoft.Extensions.Configuration.Json": "1.0.0",
    "Microsoft.Extensions.Logging": "1.0.0",
    "Microsoft.Extensions.Logging.Console": "1.0.0",
    "Microsoft.Extensions.Logging.Debug": "1.0.0",
    "Pyra.ProductLifecycle": "1.0.0-*",
    "Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
  },

  "frameworks": {
    "net452": {}
  },

  "buildOptions": {
    "emitEntryPoint": true,
    "preserveCompilationContext": true
  },

  "publishOptions": {
    "include": [
      "web.config",
      "appsettings.test.json"
    ]
  },
  "tools": {
    "Microsoft.EntityFrameworkCore.Tools": "1.0.0-preview2-final"
  },

  "scripts": {
    "postpublish": [ "dotnet publish-iis --publish-folder %publish:OutputPath% --framework %publish:FullTargetFramework%" ]
  }
}

Expected behavior

We expected the project to build /publish normally.

Actual behavior

C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi>dotnet publish
Publishing Pyra.ProductLifecycle.WebApi for .NETFramework,Version=v4.5.2/win7-x64
Project Pyra.ProductLifecycle (.NETFramework,Version=v4.5.2) was previously compiled. Skipping compilation.
Project Pyra.ProductLifecycle.WebApi (.NETFramework,Version=v4.5.2) will be compiled because expected outputs are missing
Compiling Pyra.ProductLifecycle.WebApi for .NETFramework,Version=v4.5.2
Object reference not set to an instance of an object.
C:\Program Files\dotnet\dotnet.exe compile-csc @C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi\obj\Debug\net452\dotnet-compile.rsp returned Exit Code 1

Compilation failed.
    0 Warning(s)
    0 Error(s)

Time elapsed 00:00:00.1628216

Published 0/1 projects successfully

Environment data

dotnet --info output:

C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi>dotnet info

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.DotNet.Cli.Utils.ProjectToolsCommandResolver.ResolveCommandSpecFromToolLibrary(LibraryRange toolLibraryRange, String commandName, IEnumerable`1 args, ProjectContext projectContext)
   at Microsoft.DotNet.Cli.Utils.ProjectToolsCommandResolver.ResolveCommandSpecFromAllToolLibraries(IEnumerable`1 toolsLibraries, String commandName, IEnumerable`1 args, ProjectContext projectContext)
   at Microsoft.DotNet.Cli.Utils.ProjectToolsCommandResolver.ResolveFromProjectTools(String commandName, IEnumerable`1 args, String projectDirectory)
   at Microsoft.DotNet.Cli.Utils.ProjectToolsCommandResolver.Resolve(CommandResolverArguments commandResolverArguments)
   at Microsoft.DotNet.Cli.Utils.CompositeCommandResolver.Resolve(CommandResolverArguments commandResolverArguments)
   at Microsoft.DotNet.Cli.Utils.CommandResolver.TryResolveCommandSpec(String commandName, IEnumerable`1 args, NuGetFramework framework, String configuration, String outputPath)
   at Microsoft.DotNet.Cli.Utils.Command.Create(String commandName, IEnumerable`1 args, NuGetFramework framework, String configuration, String outputPath)
   at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, ITelemetry telemetryClient)
   at Microsoft.DotNet.Cli.Program.Main(String[] args)
dotnet --version
1.0.0-preview2-003121
brthor commented 8 years ago

@mmisztal1980

There was a similar report some time ago, are you running any parallel sessions of dotnet? Is this on a shared machine where parallel dotnet sessions could be running?

Does this repro 100% of the time?

mmisztal1980 commented 8 years ago

We have 2 agent instances installed on that VM, however currently their load is low, so usually a single agent is doing some work while the other is idle. The other is typically used to execute release builds.

Right now we're experiencing a 100% repro rate.

brthor commented 8 years ago

Can you try setting the NUGET_PACKAGES environment variable to point to a directory other than %USERPROFILE%\.nuget

I am curious if something unexpected has happened inside the cache that is causing this, as it looks like the tool resolver that is crashing.

mmisztal1980 commented 8 years ago

It seems that this variable is not present in the first place...

C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi>set
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\localadmin\AppData\Roaming
ChocolateyInstall=C:\ProgramData\chocolatey
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramW6432=C:\Program Files\Common Files
COMPUTERNAME=NLTVS00483
ComSpec=C:\Windows\system32\cmd.exe
DNX_HOME=C:\Users\localadmin\.dnx
HOMEDRIVE=C:
HOMEPATH=\Users\localadmin
LOCALAPPDATA=C:\Users\localadmin\AppData\Local
LOGONSERVER=\\NLTVS00483
NUMBER_OF_PROCESSORS=2
OS=Windows_NT
Path=C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\dotnet\;C:\ProgramData\chocolatey\bin;C:\Users\localadmin\.dnx\bin;C:\Program Files\Microsoft DNX\Dnvm\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files (x86)\MSBuild\14.0\Bin;C:\Program Files\nodejs\;C:\Program Files (x86)\Microsoft VS Code\bin;C:\Users\localadmin\AppData\Roaming\npm;C:\Program Files\docker;C:\Program Files\Microsoft\Web Platform Installer\;C:\Program Files\Git\cmd;
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
PROCESSOR_LEVEL=6
PROCESSOR_REVISION=3f02
ProgramData=C:\ProgramData
ProgramFiles=C:\Program Files
ProgramFiles(x86)=C:\Program Files (x86)
ProgramW6432=C:\Program Files
PROMPT=$P$G
PSModulePath=C:\Program Files\WindowsPowerShell\Modules;C:\Windows\system32\WindowsPowerShell\v1.0\Modules;C:\Program Files\WindowsPowerShell\Modules\;C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ResourceManager\AzureResourceManager\;C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement\;C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\Storage\
PUBLIC=C:\Users\Public
SystemDrive=C:
SystemRoot=C:\Windows
TEMP=C:\Users\LOCALA~1\AppData\Local\Temp
TMP=C:\Users\LOCALA~1\AppData\Local\Temp
USERDOMAIN=NLTVS00483
USERDOMAIN_ROAMINGPROFILE=NLTVS00483
USERNAME=localadmin
USERPROFILE=C:\Users\localadmin
VS140COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\
windir=C:\Windows

Will try to set it to something in the mean time

brthor commented 8 years ago

I wouldn't expect this variable to be set by default, it's used to redirect the package cache.

My intention here is to isolate the repro to determine the cause. If in your current process (not machine wide) you set this variable to somewhere unique, and the repro goes away, my thoughts are that something is corrupted in your package cache and we can go from there.

mmisztal1980 commented 8 years ago

Env variable set to NUGET_PACKAGES=C:\Users\localadmin\.nuget.new

Seems that dotnet build tried to repopulate the cache, however it failed shortly after...

C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi>dotnet build

Welcome to .NET Core!
---------------------
Learn more about .NET Core @ https://aka.ms/dotnet-docs. Use dotnet --help to see available commands or go to https://aka.ms/dotnet-cli-docs.
Telemetry
--------------
The .NET Core tools collect usage data in order to improve your experience. The data is anonymous and does not include commandline arguments. The data is collected by Microsoft and shared with the community.
You can opt out of telemetry by setting a DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1 using your favorite shell.
You can read more about .NET Core tools telemetry @ https://aka.ms/dotnet-cli-telemetry.
Configuring...
-------------------
A command is running to initially populate your local package cache, to improve restore speed and enable offline access. This command will take up to a minute to complete and will only happen once.
Decompressing 100% 4188 ms
Expanding 100% 30288 ms
Project Pyra.ProductLifecycle (.NETFramework,Version=v4.5.2) was previously compiled. Skipping compilation.
Project Pyra.ProductLifecycle.WebApi (.NETFramework,Version=v4.5.2) will be compiled because expected outputs are missing
Compiling Pyra.ProductLifecycle.WebApi for .NETFramework,Version=v4.5.2

(a really long list of missing packages in here...)

Compilation failed.
    0 Warning(s)
    228 Error(s)

Time elapsed 00:00:00.3955319
brthor commented 8 years ago

attempt a dotnet restore those packages probably aren't in the default populated cache

mmisztal1980 commented 8 years ago

Riiiiight (1am here so bare with me).

dotnet restore followed by a

C:\agents\1\7\s\src\Pyra.RiskProfiler.Service\Pyra.ProductLifecycle.WebApi>dotnet build
Project Pyra.ProductLifecycle (.NETFramework,Version=v4.5.2) was previously compiled. Skipping compilation.
Project Pyra.ProductLifecycle.WebApi (.NETFramework,Version=v4.5.2) will be compiled because expected outputs are missing
Compiling Pyra.ProductLifecycle.WebApi for .NETFramework,Version=v4.5.2

Compilation succeeded.
    0 Warning(s)
    0 Error(s)

Time elapsed 00:00:02.2690183

... has yielded a positive result.

What does this mean? Can I take measures to avoid this in the future?

brthor commented 8 years ago

Riiiiight (1am here so bare with me).

no worries

This means something became corrupted in the %USERPROFILE/.nuget/.tools directory tree, are you able to share a zip of that directory tree?

For immediately staying unblocked, one approach we take in cli is to use a build-specific package cache. So for every build in VSTS we set the NUGET_PACKAGES environment variable to the root of the repo where VSTS puts it. I believe you can do this In VSTS itself although we manage it inside our build scripts.

This will prevent most issues like this, although the general issue needs to be fixed as well.

mmisztal1980 commented 8 years ago

Can you elaborate on how to setup the build-specific package cache? is it a matter of setting the env variable per build definition?

tools.zip

Please confirm that you've received the zip file.

brthor commented 8 years ago

zip is received.

It is a matter of setting the env variable per build definition. VSTS has this capability and provides some additional variables you can use to get the root directory of the build.

https://www.visualstudio.com/en-us/docs/build/define/variables

AGENT_BUILDDIRECTORY looks like what you're looking for

livarcocc commented 7 years ago

This seems to be have been resolved. Please re-open if you are still running into problems.

mmisztal1980 commented 7 years ago

I've had this happen to me again just now. Using preview2 tooling version.

tools.zip

TheRealPiotrP commented 7 years ago

@mmisztal1980 can you try this with Preview3? Since the toolset went through such a major overhaul there is not much opportunity to fix preview2 [project.json-based] issues.

mmisztal1980 commented 7 years ago

I'm afraid not now, it's going to be a while before our ogranization migrates to preview 3. I estimate this will happen sometime in Q12017. Is there anything else we can try?

On Tue, Nov 22, 2016 at 9:23 AM, Piotr Puszkiewicz <notifications@github.com

wrote:

@mmisztal1980 https://github.com/mmisztal1980 can you try this with Preview3? Since the toolset went through such a major overhaul there is not much opportunity to fix preview2 [project.json-based] issues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/cli/issues/4085#issuecomment-262176417, or mute the thread https://github.com/notifications/unsubscribe-auth/ABS1AZt5n9Ufd9zSN4YCC-g8khSbkMirks5rAqadgaJpZM4JsfUM .

TheRealPiotrP commented 7 years ago

I think @brthor's advice was quite good... and in line with what we do for CLI builds. However, given the path you shared, I suspect that both agents share a single cache.

We:

  1. Use NUGET_PACKAGES in our build scripts
  2. locate the target directory of NUGET_PACKAGES inside the repo directory, so each build is guaranteed a private copy
  3. Through (2), each build sets its own path to that directory so there is no chance of overlap
  4. In CI, clean that directory between builds

(4) is something you can toggle. it's not strictly necessary and adds to the build times + network traffic, but improves stability particularly when package versions get re-used for multiple builds [e.g. when you want to claim something is 1.0.0-rc2 and not 1.0.0-rc2-12345 and then you take that one extra fix after the build was built... but that never happens :p ]

jsgoupil commented 7 years ago

I faced the exact same issue on Azure. Deleted the .tools folder, then re-ran the nuget restore then it started to compile.

jsgoupil commented 7 years ago

@piotrpMSFT I can re-create the same problem on a new machine/slot on Azure 100% of the time. My deployment script does
nuget restore path/to/sln.sln -packagesavemode nuspec

This creates a "Microsoft.AspNetCore.Server.IISIntegration.Tools" in /.nuget/.tools with "netstandardapp1.5" in it.

I run
nuget restore path/to/proj.xproj -packagesavemode nuspec

Then a "netcoreapp1.0" shows up in this package. Now I can build.

image

My SLN contains a XPROJ and a CSPROJ.

TheRealPiotrP commented 7 years ago

@jsgoupil If you're using nuget restore you'll want to open an issue on nuget/home repo. I'm afraid that CLI only ships with the nuget crossplat client and this one is fairly different from full-fledged desktop NuGet restore. Please open an issue there to track!

jsgoupil commented 7 years ago

@piotrpMSFT ... thanks for the input. After more research (because I just spent 30h on this), Azure is with NuGet 3.5, and when you download NuGet from the home page, it ships with 3.4

3.5 behaves properly, 3.4 doesn't.

TheRealPiotrP commented 7 years ago

Glad you shared that @jsgoupil as I'm sure it will help others. Is there anything left to track in this issue, or can we close?

jsgoupil commented 7 years ago

Yes you may. Thanks.