Closed yuezengms closed 4 years ago
This is AWESOME! Thanks a lot @antonioortizpola, fingers crossed that we will be now able to quickly root-cause it and fix it in 3.0/2.2! 🙏
@wfurt, were you able to reproduce the scenario? just to know if the repo code worked for you, or if there is something more that I can help with
I'm still diffing through @antonioortizpola . I got services up and I could see
root@9262db4be2d7:/app# netstat -natu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.11:42827 0.0.0.0:* LISTEN
tcp 0 0 172.18.0.3:43294 5.153.231.4:80 TIME_WAIT
tcp 0 0 172.18.0.3:43286 5.153.231.4:80 TIME_WAIT
tcp 0 0 172.18.0.3:34212 151.101.52.204:80 TIME_WAIT
tcp 0 0 172.18.0.3:34218 151.101.52.204:80 TIME_WAIT
tcp 0 0 172.18.0.3:34214 151.101.52.204:80 TIME_WAIT
tcp6 0 0 :::80 :::* LISTEN
udp 0 0 127.0.0.11:34452 0.0.0.0:*
but if I wait a little bit, all the connections go away.
root@9262db4be2d7:/app# netstat -natu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.11:42827 0.0.0.0:* LISTEN
tcp6 0 0 :::80 :::* LISTEN
udp 0 0 127.0.0.11:34452 0.0.0.0:*
I also could not find any direct usage of HttpClient. Everything seems to be wrapped in some high-level calls so I'll need to unwind that.
@wfurt, did you run over the 3 version or 2.1? that is the behavior that i got from the 2.1 version. In the core 3 version the sockets would never exit the CLOSE_WAIT
state, even an hour after the test ended.
a little confused by the comment from @karelz. the fix should be applied to 3.0/2.2, as 2.1 does not have this socket issue, but the memory thing that should be fixed/merged in 2.2
@jarlehal typo, fixed, thanks for pointing it out.
@wfurt has been doing a good job digging into this, and shared with me that he noticed something suspicious, that in a repro when analyzing it with SOS there ended up being a small number of Sockets on the heap but a large number of SafeSocketHandles. Based on that, I have a theory that this is due to https://github.com/dotnet/corefx/pull/32845 / https://github.com/dotnet/corefx/pull/32793. I don’t think it actually caused the problem so much as the bug it was fixing was actually masking this problem that’s existed for a long time.
SocketsHttpHandler creates a Socket for each connection. Each Socket creates a SafeSocketHandle (that’s its name in 3.0; prior to that it was internal and named SafeCloseSocket), a SafeHandle that wraps the underlying file descriptor (there’s actually a secondary SafeHandle in the middle, but that’s not relevant). On Unix, when the Socket is connected, it’s registered with the SocketAsyncEngine, which is the code responsible for running the event loop interacting with the epoll handle. Whenever the epoll wait shows that there’s work available to be done, the event loop maps the relevant file descriptor back to the appropriate SafeSocketHandle so that the relevant work can be performed and callbacks invoked. In order to do that mapping, the SocketAsyncEngine stores a ConcurrentDictionary<IntPtr, SafeSocketHandle>, and the engines themselves are stored in a static readonly SocketAsyncEngine[] array… the punchline here is that these SafeSocketHandles end up being strongly rooted by a static array.
The other important piece of information is that there’s a Timer inside SocketsHttpHandler that runs periodically to check whether connections in the connection pool are still viable, and if they’re not, Dispose’s of them. The bug that the aforementioned issues fixed was that there was an unexpected cycle formed between the timer and the connection pool that ended up keeping everything alive indefinitely, resulting in a non-trivial memory leak. However, as a side effect of that leak, it meant that the timer would continue to run, and every time it fired, it would loop through all of the open connections and Dispose of the ones that were no longer viable. In the fullness of time, all of them would get Dispose’d. Disposing of the connection would dispose of the Socket which would Dispose of the SafeSocketHandle and remove it from the SocketAsyncEngine’s dictionary.
Now, with the aforementioned fixes, if code fails to Dispose of the HttpClient/SocketsHttpHandler when done with them and drops the references to them, the timer gets collected, as does the connection pool, as do all of the HttpConnection objects in the pool. None of those have finalizers, nor should they need them. But here’s the rub. Socket does have a finalizer, yet its finalizer ends up being a nop. Since the storing of the SafeSocketHandle into the static dictionary isn’t something that can be undone automatically by GC, we actually need a finalizer to remove that registration should everything get dropped. Since all those objects don’t have finalizers, and since Socket’s finalizer isn’t doing the unregistration, everything gets collected above the SafeSocketHandle, which then remains registered effectively forever, never being disposed of, and never closing its file descriptor.
I don’t know for certain whether this is the cause of this issue. It’s just a theory, and @wfurt is working through the repro, debugging, and testing out theories. If this doesn’t turn out to be the root cause here, I suspect it’s still a bug we need to fix. If it does turn out to be the root cause, I don’t think the fix is to revert the aforementioned fixes: they were valid, they just revealed this existing problem they had been masking by creating a different leak that in turn allowed the timer to dispose of these resources... plus, this issue would apply to all uses of Sockets that weren't disposed of, not just those used from SocketsHttpHandler. The actual fix would likely be to either use a weak reference when storing the SafeSocketHandle into the dictionary (which might be the right fix but could also potentially cause perf or otherwise unforeseen problems), or to ensure that a finalizer is put in place to undo that registration (most likely changing Socket’s finalizer accordingly on Unix).
In the meantime, assuming this is the cause, in addition to fixing it in System.Net.Sockets, code using HttpClient/HttpClientHandler/SocketsHttpHandler should also be Dispose'ing of those instances when done with them. If you just create a single HttpClient instance that's stored in a static, there's no real need to dispose of it, as everything will go away when the process ends. But if you're doing something that creates an instance, uses it for one or more requests, and then get rid of it, when getting rid of it it should be disposed.
cc: @geoffkizer, @tmds
I'm making some progress. On the note above: If you add cenamOperationClient.Close()
to GetSubscriberDetailsF() after response is received to close wfc, there are no lingering sockets at all @antonioortizpola
With old platform handlers, socket can be closed independently but now usual reference counting works and disposing HttpClient when not used can lead to delayed release of resources. So as any reference to HttpResponseStream would keep underlying socket open.
I think there is definitely issue with 2.2+ but there can be more than one reason for observed behavior.
We have a same issue. All ports created with IHttpClientFactory stays in CLOSE_WAIT state
All ports created with IHttpClientFactory stays in CLOSE_WAIT state
@glennc, @rynowak, is HttpClientFactory disposing of all handlers it creates?
@vasicvuk, are you disposing of all HttpResponseMessages you're given and response Streams you're given?
@wfurt, thanks a lot for your investigation! I will change the code so the service closes the connection.
I am glad the repository could help to replicate the problem and I hope it could help others.
Please correct me if I am wrong, but I think the main problems are:
For the first group, please make sure you are using HttpClient correctly, most probably, it will fix the problem and improve your system.
For the second group, search for methods that could close the connection or IDisposable interfaces, and make tests monitoring your sockets (like with netstat -natu
) to check if it can help fix the problem. Or check if you can reuse your connections.
If the problem persist tell us how are you using the client or socket or library, and if possible create a simple repository with a reproduction case.
The repro was extremely useful, thanks @antonioortizpola.
In either case we should not leak OS resource and we do right now in some cases with 2.2 code. Disposing explicitly is best as everything is released when not needed. In the other case socket can stay opened until GC kicks in and that may take some time depending on many variables.
I think the main problems are
There are two issues here:
Fixing either of those is technically sufficient to address this issue, although even when (1) is fixed, it's important for (2) to be done, as the fix for (1) is still non-deterministic and could take an unbounded amount of time to kick-in.
@stephentoub Hi, We are using HttpClientFactory and using for HttpResponseMessages. So as i understand issue is that socket is not beign disposed after some time if there is no strict dispose in code. I guess that using block will fix this?
I guess that using block will fix this?
yes, a using block causes Dispose to be called.
I submitted fix to 3.0 master. It would be great if anybody can grab daily build and verify that it solves their issue.
Since this is somewhat generic error, there may be more than one issue under the cover. In either case any feedback would be useful.
kudos to @antonioortizpola who was able to isolate repro.
Will it be possible to get this fix in 2.2?
possibly. It will be easier to get approval if we can confirm that this fix solves observed issues. e.g try 3.0 before and after.
For us having exact issue symptoms isolated due to the application code.
This issue is making big troubles for us in production apps.
We reviewed all HttpClient usages in our code and migrated to HttpClientFactory. But it still happens from time to time.
The interesting thing is that we're having 2 similar apps (they use HttpClient the same way) on Azure Kubernetes and GKE (Google's Kubernetes) and the issue only happens on Azure.
Any plans to fix it in 2.2?
2.2 port is waiting for verification @alxbog. We need to get enough evidence that dotnet/corefx#38499 fixes it or we need separate repro for 2.2. Until then it is unlikely we get permission for 2.x changes.
If someone is willing to try a private build with the fix ported to 2.2/2.1, that would help us prove it is worth porting. If you are that person let us know and we can provide privates with the change ported for testing ...
Can anyone confirm if this documentation is correct? I've seen some comments suggesting we should be disposing HttpClients now even though the docs pretty much say the opposite: https://docs.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-3.0#httpclient-and-lifetime-management
@phillijw note that closed issues are not monitored. The docs you linked are for IHttpClientFactory
(part of ASP.NET code).
In general, you should not dispose HttpClient, but if you want to avoid stale DNS records problem, it is healthy to recycle your static instance on regular basis (HttpClientFactory does it for you).
Can anyone confirm if this documentation is correct? I've seen some comments suggesting we should be disposing HttpClients now even though the docs pretty much say the opposite
The short answer is: HttpClient is an IDisposable, and as with any IDisposable, you should Dispose of it whenever you're done with it.
The question then becomes "when should I be done with it?"
If you're creating your own HttpClient instance, e.g. new HttpClient()
or new HttpClient(new SocketsHttpHandler())
, then you should ideally be reusing the instance over and over and over, rather than creating a new one per request. That's because the underlying handler owns the connection pool, disposing of the handler will dispose of the connection pool, and the aformentioned constructors end up using the public HttpClient(HttpMessageHandler handler, bool disposeHandler)
constructor with disposeHandler:true
. You still want to Dispose of the HttpClient when you're done with it, but in the common case you shouldn't ever be "done" with it, as you just stash it into a static field and use it for all subsequent requests. If you do decide to be done with it at some point, such as if you want to replace it with a different instance for some reason, then you'd want to Dispose of it then. Again, in this way, it's no different from any other IDisposable: it owns resources, dispose of it when you no longer need those resources.
However, the docs you link to are for IHttpClientFactory
. It muddies the waters a little, because it maintains and manages its own set of HttpMessageHandler
instances, and its design is to give you back a new HttpClient
instance every time you ask for one: the intent here is that you use that HttpClient
for the lifetime of your request, at which point you're "done" with it, and as such per my previous comments at which point it should be disposed. That instance wraps one of these shared HttpMessageHandler
instances, but was constructed with disposeHandler:false
argument:
https://github.com/aspnet/Extensions/blob/c2147ae6a07c5ebf6aa6ef2f8de86e0851fc13ca/src/HttpClientFactory/Http/src/DefaultHttpClientFactory.cs#L117-L134
such that disposing of the HttpClient
won't dispose of the underlying shared handler.
Thanks for the explanation @stephentoub. The clarity on disposeHandler
is what I was really missing. I feel like the examples on the docs could be updated to maybe discuss that point a bit more or to show examples where the http client DOES get disposed. For instance, https://docs.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-3.0#use-ihttpclientfactory-in-a-console-app does not dispose even though it fits your example of where it should be disposed, I think.
does not dispose even though it fits your example of where it should be disposed
Yes, the client
in the GetPage
method in that sample should be disposed. Thanks for pointing that out.
cc: @glennc, @rynowak
@karelz I experience this problem with .net core 2.2. Is it possible to get private libs to test the backport?
@wfurt can you please create a private build against 2.2 for @MrZoidberg to test?
UPDATE: Disregard this comment... the System.Net.Sockets.SocketException: Address already in use
bug kept happening in both cases until. I've only solved it by moving my class from services.AddScoped
into services.AddSingleton
PREVIOUS COMMENT (again, disregard): I'm getting this error a lot now in situation where I've switched from:
httpResponse = await _client.GetAsync(method);
to:
using (var req = create(HttpMethod.Get, finalUrl)) {
httpResponse = await _client.SendAsync(req);
}
protected virtual HttpRequestMessage create(HttpMethod method, string url, string postBody = null) {
var req = new HttpRequestMessage(method, url);
return req;
}
I'm disposing of httpResponse
after deserializing response in both cases. Any opinion on the code @karelz @stephentoub? I'm on .NET Core 2.2 and would prefer not upgrading to .NET Core 3.0, unless it would help you further debug and solve the problem.
The problem with the socket comes from the HttpClient
, not HttpRequestMessage
, how are you generating the HttpClient? are you using DI, HttpClientFactory?
Have you registered your HttpClient
at bootstrap ? i.e. something like this:
services.AddHttpClient<IMyService, MyService>(c =>
{
c.BaseAddress = new Uri("https://base_uri");
c.Timeout = TimeSpan.FromSeconds(30);
}).SetHandlerLifetime(TimeSpan.FromSeconds(30));
Then from your MyService
inject it in:
private readonly HttpClient _client;
public MyService(HttpClient client)
{
_client = client;
}
Implementation should be as simple as:
using (var response = await _client.GetAsync(url))
{
return await response.Content?.ReadAsStringAsync();
}
Since you're using the GET
method the GetAsync(url)
function should be sufficient.
This lets .NET Core handle all your connection pooling for you so you don't have to worry about managing it yourself.
I'm creating HttpClient
by myself in both cases. However if I use it HttpRequestMessage
and SendAsync
- problems with System.Net.Sockets.SocketException: Address already in use
start.
If I just use GetAsync - no problems.
@karelz @stephentoub can you please review this and provide any feedback? I'm hoping if you investigate in this direction that you can potentially solve the problem for others as well.
However if I use it HttpRequestMessage and SendAsync - problems with System.Net.Sockets.SocketException: Address already in use start.
Can you share the code you use with SendAsync? Are you calling it with ResponseHeadersRead?
I'm hoping if you
GetAsync
is just a wrapper around SendAsync
.
i got same error in production too(.netcore 2.2 ubuntu k8s) we follow the sample code with aspnet suggest doc and here is the demo code:
in Startup.cs
public void ConfigureServices(IServiceCollection services)
{
services.AddMvc();
// clients
services.AddHttpClient<IAddressClient, AddressClient>()
.AddHttpMessageHandler(handler => new TimingOutDelegatingHandler())
.AddHttpMessageHandler(handler => new RetryPolicyDelegatingHandler());
here is the AddressClient.cs
public AddressClient(
HttpClient httpClient,
IOptions<UsersApiDomainConfig> usersApiConfig,
ILogger<AddressClient> logger)
: base(usersApiConfig)
{
_httpClient = httpClient;
_logger = logger;
}
here is the use of httpclient:
public async Task SyncAddressAsync(AddressModel address)
{
var url = GetUrl(Path);
var response = await _httpClient.PutAsync(
url,
new StringContent(JsonConvert.SerializeObject(address),
Encoding.UTF8,
HttpClientConstants.ApplicationJson));
i think we do as the doc suggest to
However if I use it HttpRequestMessage and SendAsync - problems with System.Net.Sockets.SocketException: Address already in use start.
Can you share the code you use with SendAsync? Are you calling it with ResponseHeadersRead?
Thanks for your help @stephentoub and @OpenSourceAries ... in the end I solved my problem in different fashion since I was still getting errors... updated original post: https://github.com/dotnet/corefx/issues/37044#issuecomment-545038295
I'm also seeing this exception with MongoDB driver in K8s, which isn't using the HttpClient. So is this fix this a thing or is the scope larger than originally anticipated?
~Assumption: Duplicate of dotnet/runtime#27274 which was fixed by dotnet/corefx#32046 - goal: Port it (once confirmed it is truly duplicate).~ This is HttpClient/TCP spin off. UdpClient is covered fully by dotnet/runtime#27274.
Issue Title
"System.Net.Sockets.SocketException: Address already in use" on Linux
General
Our .net core(v 2.2.0) services are running on Azure Kubernettes Linux environment. Recently we experimenced a lot of error "System.Net.Http.HttpRequestException: Address already in use" while calling dependencies, e.g. Active Directory, CosmosDB and other services. Once the issue started, we kept getting the same errors and had to restart the service to get rid of it. Our http clients are using DNS address, not specific ip and port. The following is the call stack on one example. What can cause such issues and how to fix it?