dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.6k stars 1.03k forks source link

dotnet build fails on .sln with IOException #11281

Open benmccallum opened 4 years ago

benmccallum commented 4 years ago

Recently, and all of a sudden, we're consistently running into a race condition in our builds. Initially I believed this was caused by the fact we build passing the .sln file and were also passing a runtime identifier (rid), but even without that the problem still occurs quite regularly.

I've deleted my comments off https://github.com/dotnet/sdk/issues/863 as it's no longer an issue with just specifying the rid, something else is going on. I've re-iterated them in here.

Scenario A docker container build. I'm specifically building a base image used by all our individual microservice images to speed things up. As such, restoring and building by pointing at the .sln file makes most sense. (It's not an exact science, but... that's Docker builds...)

My builder Dockerfile is doing:

# Restore NuGet packages
RUN dotnet restore Microservices.sln    # not doing this anymore: -r linux-musl-x64

# Copy across the rest of the source files
WORKDIR ..
COPY ./ ./

# Build entire sln
RUN dotnet build Microservices/Microservices.sln -c Release --no-restore # not doing this anymore: -r linux-musl-x64

My services Dockerfile is doing this:

# Test
RUN dotnet test ../Tests/${SERVICE_NAME}.Tests/${SERVICE_NAME}.Tests.csproj -c Release --no-restore

# Publish
RUN dotnet publish -c Release --no-restore -o /dist # not doing this anymore: -r linux-musl-x64

My docker-compose.yml is essentially specifying that each service builds itself based on the common microservices-builder image:

version: '3.7'

services:
  microservices-builder:
    image: microservices-builder
    build:
      context: ../..
      dockerfile: Microservices/Docker/builder/Dockerfile

  booking.service:
    image: booking.service
    depends_on:
      - "microservices-builder"
    build:
      context: ..
      dockerfile: BookingService/Dockerfile
    ports:
     - "33000:80"

  rating.service:
    ... etc

Expected Build succeeds.

Actual Build fails due to a IOException (used by another process) on a shared library deps.json file. The specific step is the RUN dotnet build ... in the builder Dockerfile.

10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018: The "GenerateDepsFile" task failed unexpectedly. [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018: System.IO.IOException: The process cannot access the file '/src/AutoGuru.Shared.Utilities/src/bin/Release/netstandard2.0/AutoGuru.Shared.Utilities.deps.json' because it is being used by another process. [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at System.IO.FileStream.Init(FileMode mode, FileShare share, String originalPath) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at System.IO.File.Create(String path) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at Microsoft.NET.Build.Tasks.GenerateDepsFile.WriteDepsFile(String depsFilePath) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at Microsoft.NET.Build.Tasks.GenerateDepsFile.ExecuteCore() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at Microsoft.NET.Build.Tasks.TaskBase.Execute() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
10>/usr/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(194,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]

What I've tried

  1. No longer specify the rid in the dotnet cli commands, instead specify it in the Directory.Build.props file conditionally when I know a docker build is happening. (initially was doing this only on project files which ended in Service.csproj, but removed that and still the same)
    <PropertyGroup>
    <RuntimeIdentifier Condition="'$(AUTOGURU_IS_DOCKER_BUILD)' == 'True'">linux-musl-x64</RuntimeIdentifier>
    </PropertyGroup>
  2. Uncheck "Build" for the .dcproj in Release configuration, just so that's not doing anything as it doesn't need to.
  3. Attempt to disable a parallel build by passing --disable-parallel to dotnet build.

Even though the Dockerfile for my services is doing a specific dotnet test and a dotnet publish in the specific project's directory, the intent is really that these don't need to do much (--no-restore is passed, etc.) as the image it's based on from the first Dockerfile has done all the hard work and cached everything along the way. This is essential in docker builds to speed up things.

Any help would be greatly appreciated.

benmccallum commented 4 years ago

Noticed that my Directory.Build.props file wasn't covering all the shared projects (some were one level up), but putting that in the parent's one still didn't do anything.

It'd be good to get some guidance on what projects need to specify the RuntimeIdentifier and if there's harm in specifying it on shared netstandard projects.

wli3 commented 4 years ago

Could you further investigate what proceeds is locking? In the meanwhile, you could try to run dotnet build-server shutdown --msbuild --vbcscompiler --razor before the step blocked.

benmccallum commented 4 years ago

@wli3, I'll see if I can find out what's locking it when I get some time. Any hints on how to find that out in a Linux container? Will have to do some reading...

I'll also try the shutdown command when I have the time and am back looking at that branch. Cheers!

benmccallum commented 4 years ago

@anthony-keller-79, is it possible this is happening to us because of the Nerdbank git versioning stuff? That would be something that's changed at about the right timeframe, right?

anthony-keller commented 4 years ago

@benmccallum Potentially, we could try without it for a while and see if we get the issue. It can be excluded via build props I believe. We’re not using it for our services yet

rainersigwald commented 4 years ago

@benmccallum It looks like you have a race condition in your solution build. Can you capture and share a binary log (please note caveats if you share the log) of your build? If you can't reproduce inside the container running the commands outside the container should be sufficient--I suspect this isn't an only-in-the-container problem. A log of the failing command even when it passes is likely to be helpful.

I have a tool (https://github.com/rainersigwald/ParallelBuildDebuggingLogger) to help debug this, but it's very hard to use at the moment.

To unblock yourself while debugging, can you try adding -maxcpucount:1 to that command line?

benmccallum commented 4 years ago

Hi @rainersigwald, I've had a look at your caveats on the binary log and that seems fine to me.

I can't think of anything secret in our projects files/props files (@anthony-keller-79, chime in if you have concerns), and have checked my ENV vars and there's nothing in there that's secret.

I think it'd still be prudent if I could provide access to a select number of people though rather than just uploading it in here. What's the easiest way to do that? Upload it into OneDrive and you give me some email address to provide access to?

benmccallum commented 4 years ago

I'm trying to repro with:

  1. Wipe all obj and bin folders (using a PS script)
  2. Run dotnet restore My.sln
  3. Run dotnet build My.sln --no-restore

We definitely get it on 2.2, but I'm trying to repro on 3.1 because I'd like to push ahead with that branch. I was able to repro from cmdline outside of docker first go, but not since then.

I have a feeling it might've been because I'd just closed VS which was has started trying to restore dependencies after I wiped all my obj and bin folders with a script. Reasons that make me doubt that though are:

So like you mentioned I'm still leaning towards race condition. I'm also leaning towards it being caused by Nerdbank.GitVersioning which we added fairly recently and would be the only thing that would explain why we started getting this but haven't for a very long time.

I'll try a few more times, and hopefully will get lucky, else like you mentioned I'll just post even without. Swing me any Microsoft account email/s that should get access and I'll put them up in OneDrive. Cheers!

benmccallum commented 4 years ago

FYI, @wli3's suggestion to shutdown the build server appears to have not worked. Just saw this on TeamCity (on our current dotnetcore 2.2 branch). Again the culprit is AutoGuru.Shared.Utilities.csproj. Interesting. I'll definitely try the cpu max count switch if it keeps causing me issues.

[00:51:31][Step 2/10] Step 19/21 : RUN dotnet build-server shutdown --msbuild --vbcscompiler --razor &&     dotnet build Microservices/AutoGuru.Microservices.sln -c Release --no-restore
[00:51:31][Step 2/10]  ---> Running in f6a200376132
[00:51:31][Step 2/10] Docker event: {"status":"create","id":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","from":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","Type":"container","Action":"create","Actor":{"ID":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","Attributes":{"image":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","name":"vigilant_mahavira"}},"scope":"local","time":1587739891,"timeNano":1587739891462893775}
[00:51:31][Step 2/10] Docker event: {"status":"attach","id":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","from":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","Type":"container","Action":"attach","Actor":{"ID":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","Attributes":{"image":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","name":"vigilant_mahavira"}},"scope":"local","time":1587739891,"timeNano":1587739891463057063}
[00:51:31][Step 2/10] Docker event: {"status":"start","id":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","from":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","Type":"container","Action":"start","Actor":{"ID":"f6a2003761325365a607718151916292b83667097c3c176aacb36c0037b19efa","Attributes":{"image":"sha256:572f2c8d3e7a3bc157258a128fee9a699fd1edff11df88e2621094c2476a66d8","name":"vigilant_mahavira"}},"scope":"local","time":1587739891,"timeNano":1587739891885039400}
[00:51:32][Step 2/10] Shutting down MSBuild server...
[00:51:32][Step 2/10] Shutting down VB/C# compiler server...
[00:51:32][Step 2/10] MSBuild server shut down successfully.
[00:51:32][Step 2/10] VB/C# compiler server shut down successfully.
[00:51:33][Step 2/10] Microsoft (R) Build Engine version 16.0.450+ga8dc7f1d34 for .NET Core
[00:51:33][Step 2/10] Copyright (C) Microsoft Corporation. All rights reserved.
[00:51:33][Step 2/10] 
[00:51:38][Step 2/10]   AutoGuru.Shared.Utilities -> /src/AutoGuru.Shared.Utilities/src/bin/Release/netstandard2.0/AutoGuru.Shared.Utilities.dll
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018: The "GenerateDepsFile" task failed unexpectedly. [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018: System.IO.IOException: The process cannot access the file '/src/AutoGuru.Shared.Utilities/src/bin/Release/netstandard2.0/AutoGuru.Shared.Utilities.deps.json' because it is being used by another process. [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at System.IO.FileStream.Init(FileMode mode, FileShare share) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at System.IO.File.Create(String path) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at Microsoft.NET.Build.Tasks.GenerateDepsFile.ExecuteCore() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at Microsoft.NET.Build.Tasks.TaskBase.Execute() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute() [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]
[00:51:38][Step 2/10] /usr/share/dotnet/sdk/2.2.207/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.targets(129,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask) [/src/AutoGuru.Shared.Utilities/src/AutoGuru.Shared.Utilities.csproj]

Second run worked. Makes you wonder if there's something in the preceding step holding on to a file, like the dotnet restore I do just before.

pratikvasa commented 3 years ago

I had the same issue on ubuntu. I have installed donet sdk 3.1.401.

I tried a lot of things like doing a restore separately and calling the build-server shutdown command. Nothing worked.

Then I tried the suggestion of @rainersigwald and set -maxcpucount=1 flag and ran it. It worked.

I looked at my project structure and there are 7 projects in the solution all referencing other projects in a tree like manner. But there were additional references like this.

A.csproj -- references B,C,D
| = B.csproj -- references C,D
| | = C.csproj -- references D
| | | = D.csproj

I updated the references in such a way that trainsient references are not stated in the csproj file.

A.csproj -- references B
| = B.csproj -- references C
| | = C.csproj -- references D
| | | = D.csproj

And this fixed the issue.

Though I would have thought that the build command would handle that.

Also no idea what would happen in the following scenario

A.csproj -- references B
| = B.csproj -- references C,E
| | = C.csproj -- references D
| | = E.csporj -- references D
| | | = D.csproj 

Here project D is references by 2 independant projects. This will most likely give the same error as above.