Open sksk571 opened 1 year ago
Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.
Author: | sksk571 |
---|---|
Assignees: | - |
Labels: | `area-System.Net.Http`, `tenet-performance` |
Milestone: | - |
after migration from .NET5 to .NET6 and later .NET7
We're aware of this issue on .NET 6, and it can be mitigated by setting the ConnectTimeout
(it's infinite by default).
new SocketsHttpHandler
{
ConnectTimeout = TimeSpan.FromSeconds(5)
}
and later .NET7
Are you seeing similar behavior on .NET 7? 7.0 has a mitigation in place that should prevent this issue from affecting many requests - #71785.
The events from a trace captured using System.Net.Http provider
If this situation happens again, could you also collect the following events?
System.Net.Sockets
System.Net.Security
Private.InternalDiagnostics.System.Net.Http
That would give us more information about what the connection pool was doing at the time.
Seeing this happening on .NET 7.0 is very weird, I wouldn't be surprised if this is a different issue, especially if it's true that:
Tracing some of the failed requests to connections using ActivityID shows that the corresponding connection was established and closed, but not used for some reason.
@sksk571 do you see the same scale of TaskCancelledException
-s and closed connections on .NET 7.0 as on .NET 6.0?
I would really like to see the additional traces collected on .NET 7.0, if setting ConnectTimeout
(on .NET 6.0!) doesn't help.
The issue definitely happens on .NET7, though maybe to a lesser extent than on .NET6. The workaround with setting ConnectTimeout in SocketsHttpHandler doesn't work for us, as the affected code is in a netstandard2.0 library where SocketsHttpHandler is not available.
I was able to collect the trace with the additional events, how can I share it privately? The trace is from production system so I'd avoid attaching it to the issue.
Can you send it to Miha.Zupan
and Anton.Firszov
@microsoft.com?
After exchanging information with @sksk571 it looks like there are sporadic stalled getaddrinfo
calls on their server. As a result of #49171, a single stalled call will hang any follow-up DNS resolution request which go to the same host. This leads to a massive accumulation of pending connection attempts in HttpConnectionPool
. These will only complete (fail) when the first stalled DNS resolution attempt completes. By this time the originating requests are timed out and end up throwing.
We can consider mitigating this by:
Dns.s_tasks
. In the log I received there is an enormous amount of cancellations due to PendingConnectionTimeoutOnRequestCompletion
(#71785), which currently don't affect the Dns queuing mechanism at all.Whether we are open to implement such mitigations depends whether such DNS misbehavior is common. It would be good to understand what's causing this on your infra @sksk571. Is the DNS server unresponsive? Is there any timeout configured in resolv.conf
? Does it help if you lower it?
Looks like a +1 on #19443
Triage: only one customer report so far, the root cause is environmental. Putting to "Future" for now, if we get more customers hitting this we can bump the priority.
We are experiencing similar behavior in a high-traffic ASP.NET Core application on .NET 7.0.101. The application runs on Kubernetes, and the issue typically occurs when new pods are created and begin processing a high volume of requests. The application makes numerous outbound HTTP requests to various cloud services while processing each inbound request.
The symptoms include:
http11-requests-queue-duration
event counter values reaching 40 seconds or more.current-dns-lookups
values of up to 2,000.dns-lookups-duration
values of 40 seconds or more.current-outgoing-connect-attempts
values of 100 or more.threadpool-queue-length
values of 100 or more.System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.TimeoutException: A connection could not be established within the configured ConnectTimeout.
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnectionPool.CreateConnectTimeoutException(OperationCanceledException oce)
at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1.SetException(Exception exception)
at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|281_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
at System.Net.Sockets.SocketAsyncEventArgs.<DnsConnectAsync>g__Core|112_0(MultiConnectSocketAsyncEventArgs internalArgs, Task`1 addressesTask, Int32 port, SocketType socketType, ProtocolType protocolType, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
--- End of stack trace from previous location ---
at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.HttpConnectionWaiter`1.WaitForConnectionAsync(Boolean async, CancellationToken requestCancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
@matt-augustine I'm experimenting with a fix now. Would be great if you could try a custom build for your case for validation. Will reach out on Teams.
Moving to 9.0 as it hit 3 customers. They found workarounds, but it would be nice to avoid the problem for others in future.
I believe we were severely affected by this issue, which resulted in massive stability issues that were very difficult to debug. We were able to work around by using a custom DNS resolver and SocketsHttpHandler with ConnectCallback, but it seems this is wider spread than 3 customers and can cause massive damage for things like edge services that communicate via HTTP.
Edit by @antonfirsov: This is caused by stalled DNS resolution attempts, see https://github.com/dotnet/runtime/issues/81023#issuecomment-1405441025.
Description
We have a small web application that started sporadically throwing TaskCancelledException's after migration from .NET5 to .NET6 and later .NET7. The application consists of just one controller method that retrieves image from Amazon S3, resizes it and saves a resized copy back to S3. The application works fine under .NET5, but under .NET6 and .NET7 it sporadically throws thousands of TaskCancelledException errors from method HttpConnectionPool+HttpConnectionWaiter`1+
Configuration
Regression?
This is a regression in .NET6+ compared to .NET5.
Data
Dump files captured during one of the incidents contain thousands of async calls looking like
and
The events from a trace captured using System.Net.Http provider: dotnet_20230119_154543_2847c867.9.excel.xlsx
Analysis
Tracing some of the failed requests to connections using ActivityID shows that the corresponding connection was established and closed, but not used for some reason. This issue may be related.