dotnet / docker-tools

This is a repo to house some common tools for our various docker repos.
MIT License
122 stars 46 forks source link

Disable concurrent calls for querying manifests #1111

Closed mthalman closed 1 year ago

mthalman commented 1 year ago

As part of the publish process we push manifest list tags and then query them to get their digests. The code which queries for the digests does so by executing all the requests in parallel. When Image Builder was based on .NET 6, we would periodically see the following exception when executing that code:

Unhandled exception: System.ArgumentNullException: Value cannot be null. (Parameter 'obj')
at System.OrdinalIgnoreCaseComparer.GetHashCode(String obj)
at System.Collections.Generic.Dictionary`2.FindValue(TKey key)
at System.Collections.Generic.Dictionary`2.ContainsKey(TKey key)
at System.Net.Http.Headers.HttpHeaders.AddHeaders(HttpHeaders sourceHeaders)
at System.Net.Http.Headers.HttpRequestHeaders.AddHeaders(HttpHeaders sourceHeaders)
at System.Net.Http.HttpClient.PrepareRequestMessage(HttpRequestMessage request)
at System.Net.Http.HttpClient.CheckRequestBeforeSend(HttpRequestMessage request)
at System.Net.Http.HttpClient.SendAsync(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.SendAsync(HttpRequestMessage request)

But that happened very infrequently with .NET 6. Now that Image Builder has been upgraded to .NET 7, an exception occurs nearly all the time. But with .NET 7, there's a NRE that's occurring:

Unhandled exception: System.NullReferenceException: Object reference not set to an instance of an object.at System.Net.Http.HttpConnection.WriteAsciiStringAsync(String s, Boolean async)
at System.Net.Http.HttpConnection.WriteHeadersAsync(HttpHeaders headers, String cookiesFromContainer, Boolean async)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)

This appears to be a bug with .NET. I'm able to repro locally and debug through things but am unable to determine the root cause. I will try to get a standalone repro for this and get an issue logged in dotnet/runtime and point back here.

In the meantime, I've changed the logic to no longer execute these requests concurrently. That allows execution to successfully complete and doesn't impact the execution time too bad.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

mthalman commented 1 year ago

Do you have a sense of the build time impact of these changes?

It's not too bad. I would estimate for a full build it'll end up adding less than a minute of extra time.