dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Address "System.Net.Sockets.SocketException: Address already in use" on K8S/Linux using HttpClient/TCP #29327

Closed yuezengms closed 4 years ago

yuezengms commented 5 years ago

~Assumption: Duplicate of dotnet/runtime#27274 which was fixed by dotnet/corefx#32046 - goal: Port it (once confirmed it is truly duplicate).~ This is HttpClient/TCP spin off. UdpClient is covered fully by dotnet/runtime#27274.

Issue Title

"System.Net.Sockets.SocketException: Address already in use" on Linux

General

Our .net core(v 2.2.0) services are running on Azure Kubernettes Linux environment. Recently we experimenced a lot of error "System.Net.Http.HttpRequestException: Address already in use" while calling dependencies, e.g. Active Directory, CosmosDB and other services. Once the issue started, we kept getting the same errors and had to restart the service to get rid of it. Our http clients are using DNS address, not specific ip and port. The following is the call stack on one example. What can cause such issues and how to fix it?

System.Net.Http.HttpRequestException: Address already in use ---> 
System.Net.Sockets.SocketException: Address already in use#N#   
at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)#N#   --- 
End of inner exception stack trace ---#N#   

at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)#N#   
at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)#N#   
at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)#N#   
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)#N#   
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)#N#   
at System.Net.Http.DiagnosticsHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)#N#   
at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.HttpClientWrapper.GetResponseAsync()#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.AdalHttpClient.GetResponseAsync[T](Boolean respondToDeviceAuthChallenge)#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Http.AdalHttpClient.GetResponseAsync[T]()#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.SendHttpMessageAsync(IRequestParameters requestParameters)#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.SendTokenRequestAsync()#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.CheckAndAcquireTokenUsingBrokerAsync()#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.Internal.Flows.AcquireTokenHandlerBase.RunAsync()#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext.AcquireTokenForClientCommonAsync(String resource, ClientKey clientKey)#N#   
at Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext.AcquireTokenAsync(String resource, ClientCredential clientCredential)#N#   
leecow commented 5 years ago

@karelz , @davidsh - can you have a look?

karelz commented 5 years ago

Duplicate of https://github.com/dotnet/corefx/issues/32027

karelz commented 5 years ago

Can you please try .NET Core 3.0? It was fixed there ...

yuezengms commented 5 years ago

I'll try upgrading. Thanks!

yanrez commented 5 years ago

@karelz do you plan to backport this fix to 2.2 ?

karelz commented 5 years ago

@yanrez not unless there is some good business justification - higher number of affected customers. Is that the case? (this is 2nd ask "only" so far) Also, it would help to validate this is truly the root cause, e.g. by trying 3.0 preview/daily build.

yanrez commented 5 years ago

It is happening in some of the clusters. It doesn't seem to consistently repro, but some pods go into this state and stay in it until being terminated. I understand 3.0 is still few months away (I don't know actual timeline though), so my question about 2.2 was based on assumption that hotfix for 2.2 could come earlier than 3.0 release. We will look into trying to upgrade and see if it solves the issue.

karelz commented 5 years ago

BTW: It might be good for you to register as MS employees - at least by linking your accounts: https://github.com/dotnet/core/blob/master/Documentation/microsoft-team.md ... that allows other FTEs to see you are MSFT ;)

karelz commented 5 years ago

@yanrez what is the service? How large is it roughly? How often it happens? ... That may support the business justification (even if you were not MSFT ;)). If you can verify on 3.0 that would be great. Either way, we may need to verify on early 2.1/2.2 build as part of "test signoff" to make sure we are fixing the real root cause here.

yanrez commented 5 years ago

I will follow up offline, but in some of the regions we see it happening more often - taking down several pods in our k8s cluster per day. It's very annoying at the moment, causing us few dev-hours a day to act on it and mitigate. We are also looking into automated mitigation using liveness probe wired into check if we start getting these exceptions and signalling k8s to kill the pod. Unfortunately, it's also non-trivial amount of dev work to build and deploy. Considering we can't exactly predict frequency of the issue, risk is that liveness pod might still impact our availability and cause us missing SLA.

arsenhovhannisyan7713 commented 5 years ago

I have same exception on AKS (ver. 1.11.4) , and container microsoft/dotnet:2.2-aspnetcore-runtime , Region West Europe .

antoinne85 commented 5 years ago

Just chiming in to say that my business is also experiencing this issue: AKS (v1.9.6) Region: Central US Image: microsoft/dotnet:2.2-aspnetcore-runtime

yanrez commented 5 years ago

We applied automated mitigation to count number of these exceptions and report negative signal to k8s liveness check. It helped us mitigate the impact. We didn't verify yet if latest builds of net core 3 would resolve the issue

antoinne85 commented 5 years ago

FWIW, we implemented the same liveness check but then subsequently manage to fix the issue altogether in our deployment.

For us, we had a service client that was using HttpClient internally. The class was getting instantiated for each incoming request by the DI container (resulting in a new HttpClient for each incoming request). We changed the way the client was registered such that it is only instantiated once for the entire application and the issue was resolved.

eventhorizon-cli commented 5 years ago

I experimenced this problem since yesterday,everything using socket throw "System.Net.Sockets.SocketException: Address already in use". Like Mysql connection, redis, httpclient.

kirides commented 5 years ago

For us, we had a service client that was using HttpClient internally. The class was getting instantiated for each incoming request by the DI container (resulting in a new HttpClient for each incoming request). We changed the way the client was registered such that it is only instantiated once for the entire application and the issue was resolved.

@antoinne85 this is a common mistake, people do regarding HttpClient and one of the reasons why IHttpClientFactory was added in DotNet Core 2.1. (another being singleton HttpClients not respecting DNS changes by default)

See https://docs.microsoft.com/en-us/dotnet/standard/microservices-architecture/implement-resilient-applications/use-httpclientfactory-to-implement-resilient-http-requests

The original and well-known HttpClient class can be easily used, but in some cases, it isn't being properly used by many developers.

As a first issue, while this class is disposable, using it with the using statement is not the best choice because even when you dispose HttpClient object, the underlying socket is not immediately released and can cause a serious issue named ‘sockets exhaustion’. For more information about this issue, see You're using HttpClient wrong and it's destabilizing your software blog post.

EvilBeaver commented 5 years ago

Same problem with NEST elasticsearch client on Linux under Core 2.2. Backporting fix to 2.2 would be nice

antonioortizpola commented 5 years ago

Same here @Kirides, @karelz, we are hitting this issue also in a site with a lot of traffic requests (10-15 per second, maybe more), It worked fine in 2.1, the issue has been happening since we update to 2.2, it is happening with our HttpClients, but we are already using IHttpClientFactory.

We have to restart our docker containers to fix the problem, and it is happening at least once a day. I am also afraid to update to 3, since it is a productive site, and the official release is not ready yet.

EvilBeaver commented 5 years ago

We have to restart our docker containers to fix the problem

I'm doing the same. I set docker container mode "restart=always" and in my app. Then I'm catching this SocketException. If it's caught - i'm killing the app and docker engine restarts it. Works fine, but for complex apps it should be fixed properly at dotnet level.

LukePulverenti commented 5 years ago

@karelz I have an app with a large number of users affected by this. Any Dlna media app will be affected by this. A back-port would be much appreciated. Thanks.

karelz commented 5 years ago

@LukePulverenti did you validate your problem is indeed the same root cause and is fixed in .NET Core 3.0? (and that it is not just same symptom) There seems to be enough +1s to justify backport, we just need to be sure it is the right fix ... first step would be to validate on 3.0. Then we can cherry pick and ask for private validation on 2.2/2.1 build.

antonioortizpola commented 5 years ago

It is becoming really frustrating, our code with 2.1 has a memory leak, we fixed it with 2.2, but we cannot update our servers because of this, I really do not want to update to 3, since is not prod ready yet, the migration is not so straightforward and we have some dependencies which we are not so sure will work as is in core 3 (like structure map, which we are changing to lamar).

@karelz i will try to update my project and let you know. The problem is that I would need to update to 3 and publish to prod, since this error is only seen after some hours (sometimes 2, sometimes more than a day) with high traffic, so i need to be really careful.

karelz commented 5 years ago

@antonioortizpola understood. I was hoping someone has a "repro" in environment where trying out and deploying 3.0 for a few days would be ok. Alternatively, if someone is capable to build private version out of 2.2/2.1 servicing branch with the cherrypicked fix, that would be preferred even for us. It is just a bit more involved on the prep side.

antonioortizpola commented 5 years ago

@karelz Ok, after some hard work I could update to Core 3, I was excited since our project is a GRPC Server, and I tried the Grpc Template. Sadly after just around 8 hours working, we hit the same issue.

The project is very simple, it just receives the grpc request and make a http call or a WSDL call to an external service, this services has various response times, from 200 milliseconds to timeouts after one minute, then return the response object as is, no complex processing, no database connections or anything weird.

When the error starts happening, all the http clients start showing the errors, the direct ones and the ones coming form a WSDL definition.

HttpClient throwing exception image

WSDL Client throwing the same exception image

The csproj is

<Project Sdk="Microsoft.NET.Sdk.Web">

    <PropertyGroup>
        <TargetFramework>netcoreapp3.0</TargetFramework>
        <DockerDefaultTargetOS>Linux</DockerDefaultTargetOS>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="Grpc.AspNetCore.Server" Version="0.1.19-pre1" />
        <PackageReference Include="Microsoft.AspNet.WebApi.Client" Version="5.2.7" />
        <PackageReference Include="Microsoft.VisualStudio.Azure.Containers.Tools.Targets" Version="1.4.10" />
        <PackageReference Include="System.ServiceModel.Http" Version="4.5.3" />
    </ItemGroup>

</Project>

If it helps, we are running the project in an Amazon Linux in a EC2 instance with docker, the docker file is

FROM mcr.microsoft.com/dotnet/core/aspnet:3.0-stretch-slim AS base
WORKDIR /app
EXPOSE 80

FROM mcr.microsoft.com/dotnet/core/sdk:3.0-stretch AS build
WORKDIR /src
COPY ["vtae.myProject.gateway/vtae.myProject.gateway.csproj", "vtae.myProject.gateway/"]
COPY ["vtae.myProject.gateway.proto/vtae.myProject.gateway.proto.csproj", "vtae.myProject.gateway.proto/"]
RUN dotnet restore "vtae.myProject.gateway/vtae.myProject.gateway.csproj"
COPY . .
WORKDIR "/src/vtae.myProject.gateway"
RUN dotnet build "vtae.myProject.gateway.csproj" -c Release -o /app

FROM build AS publish
RUN dotnet publish "vtae.myProject.gateway.csproj" -c Release -o /app

FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "vtae.myProject.gateway.dll"]

There was no increase in the CPU after the error, but no request was made successful after the first error shows up. Again, this was not happening in 2.1 but it is happening in 2.2 and 3.

All my http clients are Typed, I do not know if this dependency affects something

<PackageReference Include="System.ServiceModel.Http" Version="4.5.3" />

But I am using response.Content.ReadAsAsync<SomeClass>(); and _httpClient.PostAsJsonAsync(_serviceUrl, someRequestObject));

I would also like to know a way to stop the app form the app itself, so I can catch the exception and stop the server to let docker restart the container, I do not like the idea of doing just a System.Exit, but I could not find a way to do it in Core 3

EDIT

Ok, I ended up restarting the app, adding first a reference in Program.cs (A little dirty, but I guess is temporary until a fix is found).

public class Program
{
    public static IHost SystemHost { get; private set; }

    public static void Main(string[] args)
    {
        SystemHost = CreateHostBuilder(args).Build();
        SystemHost.Run();
    }

    public static IHostBuilder CreateHostBuilder(string[] args) =>
        Host.CreateDefaultBuilder(args)
            .ConfigureWebHostDefaults(webBuilder =>
            {
                webBuilder
                    .UseStartup<Startup>()
                    .ConfigureKestrel((context, options) => { options.Limits.MinRequestBodyDataRate = null; });
            });
}

Then in my interceptor I catch the exception with a contains. This is because if the error comes from a simple HttpClient, is thrown as HttpRequestException, but if comes from a WSDL services, is thrown as CommunicationException.

public async Task<T> ScopedLoggingExceptionWsdlActionService<T>(Func<TService, Task<T>> action)
{
    try
    {
        return await _scopedExecutorService.ScopedActionService(async service => await action(service));
    }
    catch (CommunicationException e)
    {
        await HandleAddressAlreadyInUseBug(e);
        var errorMessage = $"There was a communication error calling the wsdl service in '{typeof(TService)}' action '{action}'";
        _logger.LogError(e, errorMessage);
        throw new RpcException(new Status(StatusCode.Unavailable, errorMessage + ". Error message: " + e.Message));
    }
    catch (Exception e)
    {
        var errorMessage = $"There was an error calling the service '{typeof(TService)}' action '{action}'";
        _logger.LogError(e, errorMessage);
        throw new RpcException(new Status(StatusCode.Unknown, errorMessage + ". Error message: " + e.Message));
    }
}

// TODO: Remove this after https://github.com/dotnet/core/issues/2253 is fixed    
private async Task HandleAddressAlreadyInUseBug(Exception e)
{
    if (string.IsNullOrWhiteSpace(e.Message) || !e.Message.Contains("Address already in use"))
        return;
    var errorMessage = "Hitting bug 'Address already in use', stopping server to force restart. More info at https://github.com/dotnet/core/issues/2253";
    _logger.LogCritical(e, errorMessage);
    await Program.SystemHost.StopAsync();
    throw new RpcException(new Status(StatusCode.ResourceExhausted, errorMessage + ". Error message: " + e.Message));
}
sapleu commented 5 years ago

Having the same issue on microsoft/dotnet:2.2-runtime-deps using ElasticSearch NEST 5.6.6. Very annoying issue. Can't go back to 2.1 since invested a lot of time upgrading from 2.1 to 2.2. Upgrade to 3.0 Preview is not an option.

+1 to include this fix into next 2.2 release.

antonioortizpola commented 5 years ago

@sapleu do not update to 3 to fix this problem, as https://github.com/dotnet/core/issues/2253#issuecomment-482918706 states, this still happens in Core 3

rrudduck commented 5 years ago

We just got hit by this as well. It's very rare, but I'm (somewhat) glad to see it's a know issue.

rbrugnollo commented 5 years ago

Got hit by this issue 2 days ago as well. It doesn't happen very often but as soon as first 'Address already in use' shows up, we can't make any other calls until the system is restarted.

karelz commented 5 years ago

Still waiting for someone to have an environment where it happens with some frequence (aka production repro) and who can try to deploy private patch out of 2.1 or 2.1 branch. Do we have someone like that? Without that this issue is sadly blocked ...

tmds commented 5 years ago

Assumption: Duplicate of dotnet/runtime#27274 which was fixed by dotnet/corefx#32046 - goal: Port it (once confirmed it is truly duplicate).

This assumption is not correct. The fix is for UDP, the issues reported here are for HTTP (which is TCP).

Getting "Address already in use" on a TCP connect is weird. If the local end isn't bound, it should pick a port that is not in use. You may be running out of port numbers. Running netstat can help you find out what sockets are around and who owns them.

antonioortizpola commented 5 years ago

When I run netstat I do not see antything weird, the ports looks prety much the same than with 2.1.

Still waiting for someone to have an environment where it happens with some frequence (aka production repro) and who can try to deploy private patch out of 2.1 or 2.1 branch. Do we have someone like that? Without that this issue is sadly blocked ...

@karelz, I already updated to 3 and the problem still exists, the error shows up in 8-12 hours, is there anything else that I can do to help with the problem?

I know that bing is running in core 2.1, have you update yourselves to 2.2? This problem is becoming really frustrating, I do not understand how a simple project that just call some http services is causing this issue. This si really causing trust issues in the team, now I want to update for security updates but I am not sure something internal and hiden is going to be broken for the next release.

tmds commented 5 years ago

When I run netstat I do not see antything weird, the ports looks prety much the same than with 2.1.

Did you run this after a few hours? How does it change over time?

antonioortizpola commented 5 years ago

@tmds Yes, we have a load balancer in AWS, so we put the 2.1 version in one side and the core 3 in the other, after around 4-8 hours working, the server with the 3 (or 2.2 version, also tried with that) started crashing, i did a netstat -a in both servers, there were many connections open, but it seemed very the same as the 2.1 which was still working with no problems).

If it is really necessary i can do the test again to send some screen caps, sadly this wont be easy, since we already ported the project to net core 3 and many of the new code is not on the other versions.

Netstat in a server with core 2.1

Netstat in a server with core 3

This was captured with two servers working (there was no error at the capture time), i will remove my workaround to restart the server and take a capture when the error is happening, In case I miss something, because some days ago I did that test and the outputs looked the same.

Also, I tried running netstat again and again, but I did not catch anything weird, I must recognize, I do not know if I am using the netstat command right, so If I am missing something please tell me so I can try again

tmds commented 5 years ago

I'd run netstat -at to show all tcp connections. You should run it once at the start, and then once when your application has been running for a couple of hours. The netstat output you provided doesn't have any http connections. So I guess you made this at the start.

You can see the local port range that your system is choosing from like this:

$ cat /proc/sys/net/ipv4/ip_local_port_range
robjwalker commented 5 years ago

We're also seeing this issue in 2.2 on Ubuntu 18.04 VMs. Netstat seems to show a very large number of connections (outbound HTTPS) in CLOSE_WAIT. Restarting the app fixes the issue, but the connections start climbing again.

It takes several days for us to see the issue, so I haven't yet been able to observe the number of connections when we hit the error, but I would assume we're hitting the limit of ~31k and that's what's causing it.

We're seeing it in two different apps which make very different outbound HTTP connections to different endpoints.

tmds commented 5 years ago

We're also seeing this issue in 2.2 on Ubuntu 18.04 VMs. Netstat seems to show a very large number of connections (outbound HTTPS) in CLOSE_WAIT.

@karelz @davidsh @stephentoub @wfurt what may be the issue: the HTTP server closes the TCP connection, but that doesn't result in a close of the socket used by HttpClient. Over a long time, these unclosed sockets cause you to run out of local ports.

stephentoub commented 5 years ago

what may be the issue: the HTTP server closes the TCP connection, but that doesn't result in a close of the socket used by HttpClient. Over a long time, these unclosed sockets cause you to run out of local ports.

In theory that could be contributing to the issue if all of the connections were to different hosts. It's much less likely to be the issue if the number of hosts being targeted is limited; in that case, when the client goes back to the connection pool to grab a connection, it'll see that the connection has been closed by the server and properly dispose of it before creating a new connection. Further, the pool also has a timer that fires every X seconds to clean out such connections, so they shouldn't be building up in the pool.

tmds commented 5 years ago

@robjwalker how many CLOSED_WAIT connections do you see for the same host? If you watch netstat over a short period of time (e.g. 2 minutes) do you see CLOSE_WAIT connections change state to something else?

davidsh commented 5 years ago

what may be the issue: the HTTP server closes the TCP connection, but that doesn't result in a close of the socket used by HttpClient.

A proper HTTP server will always send "Connection: close" just before it closes the TCP connection. That will alert clients (browsers or HttpClient) that they should also close their side of the TCP connection.

If a server doesn't do that, that a client doesn't know that the socket was closed on the other side unless they try to do a send() or receive() on the socket.

HTTP stacks like SocketsHttpHandler will test a potentially idle connection (which might have been closed by the server) by testing the socket before declaring that the connection is usable. If not usable, then the socket will be closed by the client. SocketsHttpHandler will also close connections on its own without testing if its "idle timeout" has expired.

robjwalker commented 5 years ago

@robjwalker how many CLOSED_WAIT connections do you see for the same host? If you watch netstat over a short period of time (e.g. 2 minutes) do you see CLOSE_WAIT connections change state to something else?

Over the course of 24 hours, we saw it build to approximately 14,000 CLOSE_WAIT states. They don't seem to ever change once in that state. A different app seems to generate about 3500 CLOSE_WAIT states in the same time period. Probably because it is connecting outbound less. In both cases all connections from each app are to one IP, but the two apps are connecting to different IPs (if that makes sense.) One is an endpoint under our control, the other is Google Pub/Sub.

Our dev team is looking in to it, they are wondering if we are "creating multiple clients" and/or mis-managing httpclient. (I'm just quoting them at this point - I'm an Ops engineer, not a developer.)

tmds commented 5 years ago

A proper HTTP server will always send "Connection: close" just before it closes the TCP connection.

Load balancers in between will just close the connection when they want.

If a server doesn't do that, that a client doesn't know that the socket was closed on the other side unless they try to do a send() or receive() on the socket.

You could poll (that is: use poll/epoll/...) to get notified that the peer closes the connection (timeout or active checking is also fine).

@stephentoub @davidsh the observations from @robjwalker seem to indicate that the expected socket close (when re-using connection, on timeout) is not taking place.

wfurt commented 5 years ago

This is where TCP keep-alive helps. On client side, idle or maximum timeout should kick in as well.

antonioortizpola commented 5 years ago

@tmds Ok, yes, i can confirm, it is not fixed in core 3

https://gist.github.com/antonioortizpola/78f4a57170841fb221b117fcb7a5ec45

For us it takes around 4 hours to run out of sockets, the workaround of catching and restarting the app has mitigate somehow the problem, but we lose some requests when this happens.

BTW Tested on net core 3 preview 3 and 4, also with strech-slim and alpine, all the same

robjwalker commented 5 years ago

Just a quick update, our development team have fixed one of our apps that was suffering from this issue. I'm afraid I don't have a huge amount of detail, just that they found a place in our code where "HttpClient wasn't being shared".

Sorry I don't have more details. I'm not sure if this means we're not suffering from the same bug as others, or that we've just worked around it.

tmds commented 5 years ago

@robjwalker let us know how netstat looks with the new version after a few hours.

robjwalker commented 5 years ago

It's been running around 24 hours now, and netstat is very clean. Only one connection in CLOSE_WAIT which appears not to be related.

karelz commented 5 years ago

So, to sum it up - bunch of folks confirmed that the 3.0 fix we made does NOT help their scenarios. At least one case clarified it is actually application issue.

I will close this issue (as its original intent to port a fix to 2.1 is not reasonable at this point). I'd like to ask whoever is willing to dig deeper to file a new issue against 3.0 with some details and be prepared for back-and-forth on investigation. A repro or something would be really lovely. Verification of HttpClient reuse should happen prior to filing such issue.

Let me know if I missed anything.

antonioortizpola commented 5 years ago

@robjwalker It would be good to know how are you using the httpClients?, since we are using Typed HttpClients for our rest endpoints.

However our WSDL services are being using like this:

public async Task<BalanceQueryResponse> GetBalance(BscsServiceRequest balanceQueryRequest)
{
    var bscsClient = new InterfaceBSCSClient(
        InterfaceBSCSClient.EndpointConfiguration.InterfaceBSCSPort, _bscsConfig.BscsEndpoint); // WSDL Client
    var timedWsdlRequestWithLog = new TimedWsdlRequestWithLog(_logger, ServiceName);

    var response = await timedWsdlRequestWithLog.ExecuteAndLogRequestDuration(
        bscsClient.Endpoint,async () =>
            await bscsClient.balanceQueryAsync(balanceQueryRequest.SService, _bscsConfig.SAccount)
    );
    return new BalanceQueryResponse() { Result = response.@return };
}

This service is registered as Transient, but I do not know if I should add using for the Wsdl client, since implements IDisposable, but all the examples use the client without being inside of a using block, also I do not know if this could affect the inside of the client, reading from this comment i should not be using using in an Http request.

We did not think much of this because in 2.1 never had any issue of this kind, but maybe with the update our bad practices started to cause problems

@karelz any comments on this? or should i create a new issue to get clear on that? Also, It would be good to know what changed from 2.1 to 2.2 that caused this issue, to have more knowledge about what to avoid

rbrugnollo commented 5 years ago

@antonioortizpola I'm having the same issues as you do.. 2.1 works but 2.2 doesn't. I agree with you that I may be using bad practices that didn't cause big issues like this. But I don't know what those are.

@karelz it's not clear to me what you mean by "HttpClient reuse". What do you mean exactly by reusing a HttpClient, that I can't make 2 or more calls using the same instance?

antonioortizpola commented 5 years ago

@rbrugnollo according to the docs, you should not be using HttpClient directly, you should be using a IHttpClientFactory or any other type of client (Named, typed or generated).

Our team is using typed clients with the rest requests, so there sould not be a problem, however, thinking more deeply, with the WSDL clients we do not have access to the httpClient directly, I do not know if that could be related to the socket exhaustion problem, in which case I would not know how to make a fix or workaround, unless i drop all my wsdl clients and use direct requests, but this is too much work and basically would be dropping support for wsdl clients.