dotnet / sdk-container-builds

Libraries and build tooling to create container images from .NET projects using MSBuild
https://learn.microsoft.com/en-us/dotnet/core/docker/publish-as-container
MIT License
179 stars 36 forks source link

UnauthorizedAccessException when multiple container activated projects are in the solution. #344

Open Danielku15 opened 1 year ago

Danielku15 commented 1 year ago

Problem description:

It seems there is a problem with the download of blobs from the registry when building the container images when there are multiple projects in the solution and we do a publish of all projects in parallel.

A container image publish of all projects at the same time, will logically also trigger a parallel download of the base images and descriptors. As this path is fixed to <temp>\Containers\Content\<contenthash>.<extension> all projects will try to access the same file at the same time which leads to an exception like this:

    C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: The "CreateNewImage" task failed unexpectedly. 
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: System.AggregateException: One or more errors occurred. (Access to the path is denied.) 
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: ---> System.UnauthorizedAccessException: Access to the path is denied. 
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at System.IO.FileSystem.MoveFile(String sourceFullPath, String destFullPath, Boolean overwrite) 
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.NET.Build.Containers.Registry.DownloadBlob(String repository, Descriptor descriptor)
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.NET.Build.Containers.LocalDocker.WriteImageToStream(Image image, ImageReference sourceReference, ImageReference destinationReference, Stream imageStream) 
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.NET.Build.Containers.LocalDocker.Load(Image image, ImageReference sourceReference, ImageReference destinationReference)
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: --- End of inner exception stack trace ---
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at System.Threading.Tasks.Task.Wait()
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.NET.Build.Containers.Tasks.CreateNewImage.Execute()
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute()
14:21:27     C:\...\Microsoft.NET.Build.Containers.targets(172,5): error MSB4018: at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask)

This is very unfortunate because this way you cannot have multiple container activated projects in one solution and do a publish in one go.

Example setup Our setup looks similar like this (I simplified it here, the real setup covers some more aspects):

Directory.Build.targets:

<ItemGroup Condition=" '$(ContainerImageName)' != '' ">
    <PackageReference Include="Microsoft.Net.Build.Containers" Version="0.3.2" />
</ItemGroup>
<Target Name="ContainerPublish" Condition=" '$(ContainerImageName)' != ''">
    <Message Text="Publishing to image '$(ContainerImageName)' " />
    <Exec Command="dotnet publish -p:PublishProfile=DefaultContainer --configuration=Release" />
</Target>
<Target Name="ContainerPublish" Condition=" '$(ContainerImageName)' == ''">
    <!-- Empty target if project has no container output -->
</Target>

And we trigger the build like: dotnet msbuild -target:ContainerPublish Solution.sln The temp path is overriden in our CI system so the download will happen always.

Potential fix: The download already happens to a random temp path: https://github.com/dotnet/sdk-container-builds/blob/54e9e6180f2360a9a10b38de3eb7cc3763e91152/Microsoft.NET.Build.Containers/Registry.cs#L239

This move operation likely needs some more intelligence like: If the move fails due to an System.UnauthorizedAccessException, we could start checking again like in line 217 if the file already exists and then we're happy. Maybe it even needs to check if the file is already available for reading (as it might be being moved right now and not yet finished).

baronfel commented 1 year ago

Good find, we should probably have some locking here at the layer-download-from-registry level. Using a temp file as a lock to ensure only one process is downloading is a well known technique at this point, and possibly should be how we handle this, since we can't control people publishing in parallel.

On that note, I thought newer 7 and 8 SDKs made solution level publish an error?

Danielku15 commented 1 year ago

Good point, I should actually call publish on the correct "own" project file being built currently and not just dotnet publish. If you look closely you will notice that I am triggering a custom target on the solution, which will then should initiate a publish on each individual project. But the bug still stays valid with this adaption.

vlada-shubina commented 1 year ago

@Danielku15 @baronfel do you plan to work on this? alternatively I can look into it. Thanks

baronfel commented 1 year ago

I won't have time for a bit, so as far as I'm concerned it's up for grabs.

Danielku15 commented 1 year ago

I didn't directly plan to focus on it. I would need to check first some references on how to reliably work with such lock files to avoid race conditions. If any of you folks have a good reference I would see if I can quickly spin up some PR addressing this problem. Might take a few days as things are quite busy on my side currently.

baronfel commented 8 months ago

I think an easy way to coordinate this synchronization might be by a named mutex - the shared resources in question are

When we tried to download each layer/manifest, the MSBuild Node executing the Task would

@rainersigwald / @MichalPavlik thoughts here? other patterns we should consider? We can't directly rely on MSBuild engine caching for synchronization because the net472 pathway would be in a console app, not in an MSBuild Task.