crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.32k stars 1.61k forks source link

Redesign HTTP::Client to implement missing features #6011

Open straight-shoota opened 6 years ago

straight-shoota commented 6 years ago

HTTP::Client is missing some quite essential feature, and while there are some individual issues about that, it is better to discuss necessary design changes in a central place.

Some missing features:

The first group is about adding more flexibility to use different types of connections or enhance their usage. This needs to separate the implementation for establishing a connection from the HTTP::Client.

6001 is an incomplete experiment for that.

The concept of using a configurable transport is employed by many other implementations of HTTP clients:

It makes great sense to have a HTTP::Client as a high-level interface delegate the heavy-lifting to a replaceable low-level API. User should be able to configure HTTP::Client to use a specific transport.

The default transport should allow to connect to any TCP server and maintain a connection pool to re-use idle sockets, even beyond the life-span of a single client instance. Other transports could connect to a proxy, a Unix socket or just mock a connection for testing purposes.

It is a question how the interface between HTTP::Client and transport should be designed. In the experimental PR #6001 the transport returns an IO for a specific request, but the client still writes the request and reads the response. I'm more tending towards making the interface as simple as Request -> Response. This would obviously move most of the implementation of HTTP::Client to the transport, but I don't think that's a bad thing to move the internal heavy-lifting to the lower level. The interface is also more versatile because it allows the transport to manipulate the request before writing/response after reading. This is for example necessary for proxy requests, because they need the full URL as resource. An additional benefit is that requests and responses don't necessarily need to be sent through an IO. A transport can for example implement an in-memory client cache or directly create responses for non-HTTP protocols such as file.

Such a simple interface would also make it easy to chain transports for more flexibility and might be good place to inject middleware.

This brings us to the second group of missing, more high-level features. Some HTTP clients such as Faraday allow to inject custom middleware to the request-response-chain which can implement for example authentication or following redirects etc.

I am convinced that essential features such as following redirects, basic auth etc. should not require a middleware handler, but be implemented directly either in the low- or high-level API. They need to be optional and configurable, but I wouldn't like to see a HTTP::Client setup using a herd of middleware handlers just to provide basic HTTP. That should only be used for stuff that's on-top (for example OAuth) and can't reasonably be included in the default client implementation.

Related: #1330 (socket factory as transport)

jwoertink commented 6 years ago

Yes, that proxy support would be amazing!

rishavs commented 5 years ago

@straight-shoota Just a suggestion; can you use checkboxes instead of bullet points in your OP? it will make it easier to follow the progress in individual points, instead of checking each item individually.

BTW, really love this list of proposed improvements. proxy and http/2 support are the biggest ones IMO. For basic auth, i am more used to the idea of using a middleware, ala Node but i will be fine with handling it in a separate layer.

straight-shoota commented 5 years ago

@rishavs We're currently talking about rough design. The implementation of individual features is a far way to go.

straight-shoota commented 5 years ago

But I'd like to see this discussion moving forward.

In my opinion, the exposed API of HTTP::Client needs to be dead simple but build on a complex backend. It should be as easy as running curl or accessing a website in a browser. In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one? Is it a simple HTTP/1.1 protocol or multiplexing through HTTP/2 -- at some point even HTTP/3 HTTP::Client should simply use the best available method supported by the server. This is obviously a very sophisticated goal and will take a long time to reach, most likely till after 1.0.

Maybe this complex implementation doesn't even need to be the default - or even included in the stdlib at all. But the stdlib HTTP::Client API should be designed in such a way that the transport adapter is easily exchangeable. Its interface should be simple enough (Request -> Response) and don't assume anything about connection details. This makes it easy to improve the implementation step by step. The start would be more or less a simple, one-shot connection implementation for TCP/IP. It might even keep the connection alive and reuse it, if the next request is for the same host. Otherwise, a new connection should be established. Proxy and Unix sockets are further feature improvements and relatively easy. A big step would later be a reusable connection pool and HTTP/2. Eventually even HTTP/3.

PercussiveElbow commented 5 years ago

Has there been any movement recently on UNIX socket support?

straight-shoota commented 5 years ago

Not yet, unfortunately. But I'd like to move on with this. Possibly for 0.29.0.

m-o-e commented 4 years ago

But I'd like to see this discussion moving forward.

+1 from me (a happy crystal user who started missing persistent HTTP connections today 😞).

It should be as easy as running curl

Perhaps it could make sense to rebuild HTTP::Client on top of libcurl?

I don't know how core team would feel about introducing an external dependency for something as central as HTTP, but in my experience I end up reaching for the curl-binding in most languages most of the time anyway (e.g. typhoeus or PycURL), because the stdlib client usually lacks in features and/or performance.

straight-shoota commented 4 years ago

@m-o-e I think this has been brought up in the discussion before, and it probably won't work to integrate libcurl into Crystal's event loop.

m-o-e commented 4 years ago

@straight-shoota Ah okay, understood. Sorry for the duplicate!

yorci commented 4 years ago

That would be great to have outgoing IP bind from HTTP::Client which tcpsocket have already. Its really useful when work with API's req/hour limits based on IP.

rishavs commented 3 years ago

Are any of these, part of the 1.0 plan?

asterite commented 3 years ago

@rishavs no

erdnaxeli commented 3 years ago

In almost all cases, the application doesn't care about how a HTTP response is acquired. Does it create a new TCP connection or re-use an existing one?

For this specific point (reuse connections) I think the application should be aware of what is happening. Sockets are a limited resource and app should control how they use it. Coming from Python (where the most used HTTP lib is requests), I think a concept of session should be good:

session = HTTP::Session.new
session.get("https://crystal-lang.org")

Inside a session, calls reuse connections (if possible), but simple calls with HTTP::Client should not.

straight-shoota commented 3 years ago

Most apps really don't need to care about that. It's good to have a low-level API for managing connections manually. But for typical use cases you just want to send HTTP requests from different parts of an application without having to drag a session object around. HTTP connection management can easily be abstracted away.

erdnaxeli commented 3 years ago

But if we do that I am worried about an app leaking sockets without control over them…

I went look at how other do it: Ruby doesn't do that by default, Python neither does it, Golang does it by default (you must instantiate a client to change it).

So ok why not, the default behavior would persist connections, and if you want to change it you would need to instantiate a client object and change the parameters.

didactic-drunk commented 3 years ago

Leaking sockets is a real problem. See https://github.com/taylorfinnell/awscr-s3/issues/77. In this case the library could use a pool because all requests go to the same endpoint.

The same isn't true for the 100's of one off cases where a developer has a list of urls and tries:

urls.map { |url| HTTP::Client... }...

Often it will work in testing for a small list of urls then fail sporadically in production.

I think connection reuse should be opt in rather than opt out. Whether through a session or other API I don't know, but with real use I'm hitting socket issues from 3rd party libraries with no way to control it.

straight-shoota commented 3 years ago

If you run one-off requests with HTTP::Client's class methods, the connection should be properly closed automatically. That's expected behaviour. If you happen to observe socket leakage, it's a bug.

crysbot commented 4 months ago

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/avoiding-closuring-a-proc-thats-an-instance-variable/6834/6

crysbot commented 1 month ago

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/pooling-http-connections-automatically/7030/1