Open galvesribeiro opened 7 years ago
cc: @davidsh, @cipop, @geoffkizer
My opinion for what it's worth is. Leave httpclient alone aside from the fully managed awesomeness that is already in progress (mostly solves 1.)
However there should be a second class that shares the guts (sockets, http parsing etc) with http client but is designed for out of the box service to service or high throughput.
Http client behaves like a client-side lob out of the box (connections per server at low numbers, holding sockets etc) and by trying to make a single class to rule them all I feel the API gets restricted.
FYI I think this is the same issue that SSLStream struggles with (clients have to have long term backwards compatibility, lower resource use etc)
Good idea... Something like HttpChannel
, which essentially use external HTTP protocol implementation (to enable things like HTTP/2, WebSocket, WebRTC), and is 1 per socket. In other hand, HttpClient
use it, and can keep the pool of channels and automagically manages it.
However, there still the DNS issue which is something out of managed world... How to deal with that?
That is exactly what I am thinking, because in a server environment it would be much better to be able to get lower and control the connections (this could lead to a "server focused" pool). The DNS problem, is tough it's also a security consideration, if you look at the motions Java has gone through there are issues like DNS poisoning etc that can cause reliability and DOS attacks.
The proposals to create a pool of HttpClient instances are non-nonsensical, HttpClient already implements pooling internally, you'd get the safe effect with a single shared instance.
It does sound like there are DNS issues to address, but frequently closing connections (i.e. always using new instances of HttpClient) is over aggressive. We need a way to periodically refresh DNS and limit the lifetime of connections after we detect DNS changes for that host.
frequently closing connections (i.e. always using new instances of HttpClient) is over aggressive
Why? If you are in a server-to-server connection, like @Drawaes alluded, it would be interesting to give the service developer the ability to manage the connection lifecycle.
We need a way to periodically refresh DNS and limit the lifetime of connections after we detect DNS changes for that host.
Almost like some sort of TTL... 😁
Seriously though, if we can cache internally the DNS records for their actual TTLs we'd get the expected behavior, at least from my point of view. I haven't reviewed the managed plans that deeply, is this even a possibility or are we only getting the result (and no TTL) in those instances?
@Tratcher I am not really seeing the downside to exposing the individual connections that make up a pool. I hadn't thought of it from that direction but instead providing a second pooling provider that is more "server" focused, but actually now I think about it exposing the individual connections allows for more scenarios to be covered by "power" users.
@davidfowl I can't find it right now, but this is not too dissimilar to an idea you were commenting on a while back about a "kestrel client" similar to other frameworks provide
@galvesribeiro connection management is one of the hardest problems in the client stack, offloading that to the user isn't doing them many favors. server-to-server can use a somewhat simplified model, but even then you're not going to want to restrict it to a single connection bottleneck so you still need to track and assign connections. Doing this poorly can severely compromise the performance of your application.
There's also quite a bit of state that gets shared across connections in the client pool. E.g. cookies, cache, dns, auth, tls sessions, etc.. Managing that sharing would also get cumbersome.
Agreed on the state. But the problem for service 2 service often is the state. In a "microservice" environment you often have services calling others as driven from one incoming request.
I agree it is tough to write this by yourself so then I think a "shared primitive connection" with the socket and parsing code that is used in the httpclient and another "service" based http client would be ideal.
using (var client = new HttpClient()) { ... }
is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them. The docs have helped explain the issue after the fact, but the same mistakes continue to be made.
Proposal: Make a central static HttpMessageHandler and change HttpClient's default constructor to use it rather than newing one up. After all, it's the handler that owns the pool, not the HttpClient instance. This handler would only have settings and defaults appropriate to the shared scenario, and would no-op in its Dispose method. This static handler could internally be implemented using any of the existing handler stacks, or the new managed one.
Developers could customize all of the settings on their HttpClient instance without impacting other parts of the app (e.g. timeouts, default headers, base address). If you need to insert middleware then you wrap it around the static handler rather than a new instance. If you need to customize handler settings for a specific section of your app then you new up your own handler and client as you would today.
I agree with the above approach. I think maybe what might be useful here is to provide use cases. Then they can be assessed as to their commonality as well as the ability to modify the current API, or other solutions for them rather than talking in the abstract. So @Tratcher 's comment above is perfect in that respect.
I have the use case of a front end service (servicing a JS UI) that calls a number of backend services. This is a common pattern I have seen, but I am not saying this is how everyone does it. We would like to limit the outbound connections calling backend services, say to 500 connections, but there are 5 different services we call. With the current HttpClient we only have the option of making the connection per server limit 100, and not being able to service 500 calls if all the calls are to one of the services, or 500 and risk hitting 2500 connections.
using (var client = new HttpClient()) { ... } is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them.
I think these two sentences are so smack bang on the mark. I don't have any suggested solutions to all of this but I can fully, 100% attest to what @Tratcher summed up.
When I read that (now famous) DotNet Monsters article, my 'WTF' could probably be heard across the entire country :( That said, I felt way better when every person I asked, they to didn't know about this and we all thought we were 'smart' cause we were putting this in a using
statement and being nice-and-good little girls and boys.
I see commentary along the lines of "docs, more docs, better docs, etc" to help educate and facilitate this issue, but I just don't feel like that's ... gonna really help, etc.
So yeah -> totally 100% super-strongly agree with @Tratcher but I just don't have a suggestion to help make it better. 🤔 But this issue resonates really strongly with me and some other peers.
We all know the side effectos od the using()
on HttpClient
, so, why keep it implementing IDisposable
in the first place?
Also, static
s... I've see terrible scenarios that were always driven by having a shared thing statically... We had an huge refactory on Orleans recently just to avoid a single static client class... I would rather avoid that... Other scenarios will eventually come, and the evolution of HttpClient
would be easier if we use proper abstractions.
State sharing is always a pain, I agree. But making it static will only make the false sensation of safety.
I would stick with @Drawaes original idea and have a lower level type for server-to-server comms, and a more user friendly client type. Having both in the same type will always limit the possibilities.
We all know the side effects of the
using()
onHttpClient
, so, why keep it implementingIDisposable
in the first place?
Because IDisposable
isn’t about an object supposedly being short-lived at all. It’s about whether an object has unmanaged resources that should be disposed. And if HttpClient
has that (as it looks like), it makes perfect sense to implement IDisposable
.
It’s just that IDisposable
often implies that one should only keep the object open for a short time, although that’s not actually stated by the contract.
@poke and that is the point... Short-lived or not, the user assume that they should call Dispose()
once they are done with that object. using()
is just a syntax-sugar to define scope and call it automatically.
Anyway more a philosophical comment. :)
@stephentoub
That could of course be done. But that alone is an incomplete solution... how does someone release the resources in the pool explicitly? How does someone express isolation of pools? Etc.
SqlConnection has a limited (clumsy?) way to do that through connection string flags. ServicePoint is just plain sucks.
Just let us implement/instantiate DNS and TCP connection managers and then pass them into HTTP client or HTTP request through an overloaded ctor. Default to global singletons otherwise.
using (var client = new HttpClient()) { ... } is one of the most frequent and impactful mistakes I see made. Developers have been drilled into doing this for all IDisposable's, but they fail to appreciate the complexity of the resources being managed by that client and the consequences for churning them. The docs have helped explain the issue after the fact, but the same mistakes continue to be made.
The problem with HttpClient is that it is doing way too many things internally instead of delegating them to other classes. It should not be responsible for managing DNS cache or TCP connection pools.
@olviko HttpClient is an extremely lightweight class. It does some management of cancellation tokens, some header setting and buffer draining, but other than that it passes everything on to HttpClientHandler or WebRequestHandler. In turn HttpClientHandler allows ServicePointManager to do all the connection management and HttpWebRequest does the actual request sending.
The IDisposable thing is unfortunate but it's there to clean up cancellation tokens and request/response bodies that exist during the lifetime of sending a request or returning a response. When not in the process of making a request, the HttpClient doesn't hold onto any unmanaged resources.
@Tratcher We tried to convince the HttpClient team years ago to make the HttpClientHandler somehow globally changeable but we were unsuccessful at the time. I think it would open up a whole range of useful scenarios.
@davidfowl HttpClient was designed to long living for numerous reasons unrelated to connection management. HttpClient has a BaseAddress property, a DefaultRequestHeaders collection, and a pipeline of message handlers. Having to set these up on every request would be a waste of time and would push people to create factory classes, defeating the purpose of having HttpClient as a class.
Setting up a single HttpClient instance for each target API that you want to call is ideal because it allows setting up all the common behavior once and ensures that a connection group is managed just for that HttpClient.
The benefits of this design are more obvious when building desktop and mobile apps, but I do believe there is still some value to the model in a server environment.
But doesn't service point apply to every http handler? Eg max server connections is 1 property for every http handler, so in the server world, if I wanted to say "max connections to server for api x should be 5" and "max connections to server for api y should be 20" can I do that today? (It's possible I am missing something)
There is a scenario I ran into with an internal customer of WCF using an HTTP binding which would be impossible to provide a fix for if we had a global singleton HttpClient instance. The scenario was a service was running on multiple hosts behind a load balancer. One of the hosts ended up with a bad deployment of their service code and was returning 500 responses for every request. A mid-tier server had a pool of connections to the load balancer and one of those connections was established to the failed server. The failed server always returned a valid HTTP response so it wouldn't get dropped by the connection pooling as technically the response was a valid response at the protocol level. On top of that, as the response was very fast, that connection would get used a lot so most requests would fail. The solution was to provide a way to abandon the connection pool (this was using HttpWebRequest so I made a mechanism where you could instruct WCF to use a different ConnectionGroupName so effectively abandoning the old pool and starting a new one). The bad server had been detected and removed from the load balancer, but the load balancer doesn't close existing connections to a server removed from the pool. Switching to a new connection pool allowed moving away from the bad server. If there is a single instance of HttpClient / HttpClientHandler, then you severely restrict your ability to control your connections.
This as well... when you have level 3 load balancers 500's etc can be a pain.
I do wonder if there is some cross over between wanting an "http parser" and control the connections on the client side with the discussion over at Serverside Primatives
Because at the end of the day, a low level exposed http parser and some connection primitives (sockets etc) seem to be the requirement on both sides.
@Drawaes, using HttpClient directly, the 500's from behind a load balancer are easy to deal with. Dispose the old HttpClient and instantiate a new one. There is an overload of the HttpClient ctor which takes an HttpClientHandler and a bool to say whether the HttpClientHandler should be disposed. With this, you can have a static instance of your HttpClientHandler and create and Dispose of HttpClient without upsetting your connection pool.
does that kill the single connection in the connection pool?
If you specify false on the constructor, it won't kill the connection pool in HttpClientHandler.
Another thing that would be worth addressing in any redesign, is that if one wants to authenticate with a client certificate, it must be done in the HttpClientHandler
, and thus cannot be changed on a per-request basis.
Most of the time this is fine, as a client cert is intended to represent the client. However there are some cases where one may need to use different client certs for different requests - even to the same destination. Currently this requires at least one HttpClient
instance per certificate.
To figure out which instance to use, one may need a Dictionary
or ConcurrentDictionary
to manage their instances. 😕
It would be easier if one could attach a certificate directly to a HttpRequestMessage
.
Id love to see httpclient work on a model similar to the new transport/protocols model on the server side (https://github.com/aspnet/KestrelHttpServer/issues/1980)
The way I see it, Httpclient should only deal with things like remembering and passing along cookies, http parsing should be done by a layer below that, and the transport/connection management should be done by a layer below that
@darrelmiller
Setting up a single HttpClient instance for each target API that you want to call is ideal because it allows setting up all the common behavior once and ensures that a connection group is managed just for that HttpClient.
I have hit this problem a few times when developing Web API apps that called other web service.
When using Ninject and Castlewindsor it was possible to create multiple HttpClients at startup, configure them and pass the appropriate HttpClient into the right controllers.
But with the inbuilt DI in .NET Core it doesn't seem possible to do this.
I think this is a common use case.
Current HttpClient
might bad for concurrent , it use lock
before sending the request , and the reason is that it's instance behaviour .BaseAddress
and .DefaultRequestHeader
is not thread safe for static behaviour . I don't think an instance act like static is good .
and for the solution : use an static instance . the problem is there no way to share the instance across various package , ( imagine each package use one HttpClient , 200 packages will use 200 HttpClient and all of them not disposable and not for sure about the memory usage )
So , I think the new HttpClient should be static and provide a non-static disposable class over those static method for allowing us use .BaseAddress
and .DefaultRequestHeader
and do not need thread safe
I don't think the lock is an issue these are extremely short lived. Static public Httpclient would be evil ( and you would need many more locks) ( a static internal tcp connection pool used by httpclient is a different matter) . I agree with @Drawaes above leave it and create a new high performance client that is harder to use with a construct which accepts a tcppool etc.
Worth noting quite a few people with Rust hate the new high performance tokio as its much harder to use.
@bryanjhogan Why can't you inject in HttpClient instances in ASP.NET DI ?
@John0King I don't see why the OperationStarted flag that is used to prevent you changing BaseAddress and DefaultRequestHeaders after making a request is bad for concurrent access? It doesn't affect the request/response processing. Changing BaseAddress/DefaultRequestHeaders per request defeats the point of the properties. If you don't want to use them, you don't have to.
And I don't think anyone was suggesting that using HttpClient as a static instance is a good idea. Having a shared instance per API is possible without using statics.
@darrelmiller
Having a shared instance per API is possible without using statics.
how ? How u handle the cookie / Credential / default headers ?
and for concurrent performance , I'm not very sure, I only see the souce code and it using lock
(I think without lock
may be better for concurrent ) . At some time ,I miss the old HttpWebRequest, it don't need to share the instance so it also do not need to lock.
@John0King I don't see a lock here https://github.com/dotnet/corefx/blob/master/src/System.Net.Http/src/System/Net/Http/HttpClient.cs Can you show me where you are seeing a lock in the source?
DefaultHeaders can be stored in the HttpClient instance and Cookies and Certificates can be added to the HttpClientHandler.
@darrelmiller 😆 I must remember it incorrectly, I can't find it.
Please consider exposing the backing HttpMessageHandler
through the HttpClient
API. If only in a limited capacity - perhaps through a slimmed down facade or interface. I understand the handler's settings are mostly immutable by point-after-creation, but things like modifying the CookieContainer
's items within the handler are not. With IHttpClientFactory
and .AddHttpClient()
now being touted as best practice for managing clients, there is no clear way to manage cookies except per-request via headers. Sometimes you may want to add default cookies - but don't have them at point of creation. e.g. maybe you cache them somewhere
This can obviously be done during the factory delegate, but it's really hard for me to imagine pulling all of this customized code into the composition root of Programs.cs
. There are plenty of reasons to want to encapsulate this logic in the application component using the HttpClient
At least, perhaps just exposing the CookieContainer
just solely a getter.
Following the discussion here https://github.com/dotnet/designs/issues/9 there are several issues with current
HttpClient
implementation. The idea with this issue is to discuss the design of a proposed new implementation and API surface forHttpClient
as part of the original issue.Some points discussed on that issue:
HttpClient
. One on Windows (based on WinHttp) and other on *NIX world (based on libcurl)IDisposable
(the TCP socket still leaking for a while by default for 240s due toTIME_WAIT
state which makes you run out of available socket connections under high load)SqlConnection
. However, it would probably hit worse issues with DNS cache, since it is managed by the OS networking stack and this implementation try to be 100% managed on top of the new Foundation layer proposed on the previous issue.