Performance degrading with muliple requests

Freaky commented 11 months ago

As previously discussed, httpx performance is observed to degrade - pretty linearly - as requests are made. For example, this is a fragment of a benchmark run against a Rust Hyper server endpoint, each line being a batch of 5,000:

httpx      26.242188   0.515625  26.757812 ( 10.709051)
httpx      22.445312   0.257812  22.703125 ( 15.107915)
httpx      22.562500   0.351562  22.914062 ( 21.772903)
httpx      29.093750   0.289062  29.382812 ( 28.842361)
httpx      38.132812   0.500000  38.632812 ( 38.192884)
httpx      44.257812   0.328125  44.585938 ( 44.105139)
httpx      50.718750   0.406250  51.125000 ( 50.627243)

I also observed this under your own benchmark when adding support for running each test multiple times. Here's what ruby-prof kicks up with 1,000 requests - a fairly obvious hotspot:

 %self      total      self      wait     child     calls  name                           location
 27.34      5.655     2.038     0.000     3.617  2039667  *Array#delete_if
 11.63      1.132     0.867     0.000     0.265  2095014   HTTPX::Callbacks#callbacks     /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/callbacks.rb:33
 10.78      5.401     0.804     0.000     4.597  2037002  *HTTPX::Callbacks#emit          /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/callbacks.rb:23
  7.11      0.724     0.530     0.000     0.194  2003000   HTTPX::TCP#socket              /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/io/tcp.rb:43
  3.67      0.274     0.274     0.000     0.000  2003001   HTTPX::Connection#origin
  3.46      0.267     0.258     0.000     0.009  2153288   Hash#[]

Which appears to intensify with 10,000 requests:

 %self      total      self      wait     child     calls  name                           location
 36.84    517.122   196.842     0.000   320.280 200406486  *Array#delete_if
 15.14    102.826    80.910     0.000    21.916 200950014   HTTPX::Callbacks#callbacks     /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/callbacks.rb:33
 13.96    514.587    74.611     0.000   439.975 200370002  *HTTPX::Callbacks#emit          /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/callbacks.rb:23
  9.83     71.753    52.516     0.000    19.236 200030000   HTTPX::TCP#socket              /home/freaky/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/httpx-1.0.2/lib/httpx/io/tcp.rb:43
  5.21     27.851    27.851     0.000     0.000 200030001   HTTPX::Connection#origin
  4.09     21.940    21.847     0.000     0.093 201512288   Hash#[]
  3.91     20.912    20.912     0.000     0.000 200030000   HTTPX::Connection#io
  3.71     19.817    19.817     0.000     0.000 201983103   Symbol#==
  3.60     19.244    19.244     0.000     0.000 200086474   IO#to_io

Freaky commented 11 months ago

10,000 requests is resulting in 10,000 entries in @callbacks[:close]. I don't see anywhere that would delete that callback - it's supposed to return :delete at some point?

I have no idea what I'm doing or this would be a PR, but this seems to fix it:

diff --git lib/httpx/session.rb lib/httpx/session.rb
index 7117c433..a2421ae7 100644
--- lib/httpx/session.rb
+++ lib/httpx/session.rb
@@ -208,7 +208,7 @@ module HTTPX
       connection.on(:open) do
         emit(:connection_opened, connection.origin, connection.io.socket)
         # only run close callback if it opened
-        connection.on(:close) { emit(:connection_closed, connection.origin, connection.io.socket) }
+        connection.once(:close) { emit(:connection_closed, connection.origin, connection.io.socket) }
       end
       catch(:coalesced) do
         pool.init_connection(connection, options)

Before, MRI 3.2.2:

httpx   5.120260   0.079005   5.199265 (  5.205712)
httpx  13.016592   0.070573  13.087165 ( 13.099635)
httpx  19.630352   0.181065  19.811417 ( 19.848961)
httpx  26.415183   0.157230  26.572413 ( 26.675403)
httpx  33.317256   0.118182  33.435438 ( 33.558774)
httpx  42.153148   0.126083  42.279231 ( 42.459329)
httpx  49.364482   0.157536  49.522018 ( 49.752755)
httpx  54.784374   0.070742  54.855116 ( 55.091680)
4:07.81 real, 4:05.35 user, 1.299 sys;  page: 0 hard/44563 soft, swap: 0, I/O: 0/0

After:

httpx   1.329108   0.141186   1.470294 (  1.473378)
httpx   1.406783   0.070919   1.477702 (  1.480754)
httpx   1.401861   0.071079   1.472940 (  1.476934)
httpx   1.407888   0.102430   1.510318 (  1.516127)
httpx   1.330992   0.103024   1.434016 (  1.438439)
httpx   1.392380   0.070524   1.462904 (  1.467915)
httpx   1.385495   0.078645   1.464140 (  1.467363)
httpx   1.336024   0.086349   1.422373 (  1.425594)
13.796 real, 12.444 user, 1.090 sys;  page: 0 hard/48732 soft, swap: 0, I/O: 0/0

HoneyryderChuck commented 11 months ago

Hey @Freaky ,

I didn't forget about it :) I've already identified a couple of things. But you seem to have mainly stumbled into a regression from v1, due to the change in the default timeouts. I have a patch locally which addresses it (and the benchmark you shared is now presenting a more stable outcome), but I still need to fix a few things as a result, before sharing it back. I'll make sure to tag you when I have a changeset to share.

Freaky commented 11 months ago

Cheers, looking forward to it :)

HoneyryderChuck commented 11 months ago

FYI https://gitlab.com/os85/httpx/-/merge_requests/282 (still WIP)

Feel free to comment here if you don't have a gitlab account

Freaky commented 11 months ago

Thanks.

I guess it's a fair cop wrt the benchmark opening a new connection for each request - httpx = HTTPX.plugin(:persistent) would probably have been the fairest comparison, since manticore has its background connection pool and curb has its thread-local reused Curl::Easy, it's a bit naughty to penalise a client that's more conservative by default (no bad thing).

As an alternative, penalise everyone with a Connection: close request header to include the full connection lifecycle in the benchmark.

As an aside, the wiki for the persistent plugin is a bit confusing:

session = HTTPX.plugin(:persistent)
session.get("https://nghttp2.org")
#=> opens a secure connection to "nghttp2.org"
#=> sends request, gets response back"

HTTPX.wrap do |session|
  # will reuse the same connection
  session.get("https://nghttp2.org")
  # connection is still alive
end

Why is HTTPX inheriting behaviour from session? Do they share the same global pool of connections? What happens if you mix up HTTPX.get and session.get, do you have one side closing the other side's persistent connections?

HoneyryderChuck commented 11 months ago

httpx = HTTPX.plugin(:persistent) would probably have been the fairest comparison, since manticore has its background connection pool and curb has its thread-local reused Curl::Easy, it's a bit naughty to penalise a client that's more conservative by default (no bad thing).

Not sure tbh. The benchmark sends requests sequentially, so I'd expect curl or manticore to reuse the same connection for all requests. If it doesn't, then that could explain some of the differences I'm still seeing.

I'd not penalize by making everone open a connection though: benchmark would suffer from TCP handshake overhead and a lot of "syscall" interruption time. Assuming what I said earlier can be verified, then it's fair alright; and if it's not, perhaps those parameters can be tweaked.

Why is HTTPX inheriting behaviour from session?

HTTPX.get uses the "default session". session = HTTPX.plugin(:persistent) creates a session instance. HTTPX.wrap do |session| creates and yields a session instance which will persist connections until the end of the block. They'll share the connection pool, which is thread-local, and yes, one-side closing the other's connections may happen if they're targeting the same origins, but that'd be fairly unconventional use of the library.

Perhaps I didn't explain the differences in the wiki, will see about doing so.

Freaky commented 10 months ago

The benchmark sends requests sequentially, so I'd expect curl or manticore to reuse the same connection for all requests. If it doesn't, then that could explain some of the differences I'm still seeing.

Manticore has a per-client thread-safe connection pool with keep-alive on by default. Curb has a per-thread Curl::Easy which also has keep-alive on by default.

I'd not penalize by making everone open a connection though: benchmark would suffer from TCP handshake overhead and a lot of "syscall" interruption time

Well, the goal isn't to make the numbers as big as possible - it's to measure a difference between implementations and in different use-cases. This issue exists because I had httpx reopening connections, so it was certainly worth benchmarking!

They'll share the connection pool, which is thread-local, and yes, one-side closing the other's connections may happen if they're targeting the same origins, but that'd be fairly unconventional use of the library.

Perhaps I didn't explain the differences in the wiki, will see about doing so.

Thanks. I looked at the page for the Persistent plugin and also the Connections page, but didn't spot anything about a thread-local shared connection pool.

I see now that it is mentioned in the FAQ, which feels a little buried for such a fundamental property of how the client works.

HoneyryderChuck commented 10 months ago

Merged the MR, closing this for now. FWIW jruby is better but not great, and I found a bug in 9.4 which is blocking an improvement. Will be chasing this.

Freaky commented 10 months ago

The original issue doesn't appear to have been resolved? httpx 1.1.0 still shows a linear performance dropoff when not using persistent connections.

HoneyryderChuck commented 10 months ago

You mean in jruby, right? The patch will only work on jruby 9.4.5.0, I caught a bug meanwhile.

Freaky commented 10 months ago

No. httpx 1.1.0, MRI 3.3.0-preview2, the original test with the Benchmark section in a loop:

httpx   4.683300   0.087324   4.770624 (  4.773371)
httpx  11.685795   0.133692  11.819487 ( 11.830194)
httpx  18.240558   0.150064  18.390622 ( 18.427206)
httpx  25.831907   0.149363  25.981270 ( 26.046805)
httpx  30.493996   0.110203  30.604199 ( 30.711520)
httpx  37.882518   0.125889  38.008407 ( 38.139977)
httpx  45.471706   0.134016  45.605722 ( 45.791244)
httpx  51.762863   0.110546  51.873409 ( 52.092960)

HoneyryderChuck commented 10 months ago

humm, not sure if we're testing the same thing. FWIW this is the 5000 benchmark result when looping 5 times only for httpx:

            user     system      total        real
httpx   1.684243   0.158399   1.842642 (  2.071473)
httpx   1.773611   0.179743   1.953354 (  2.160621)
httpx   1.636213   0.153375   1.789588 (  1.875232)
httpx   1.679417   0.161824   1.841241 (  1.947303)
httpx   1.660254   0.156803   1.817057 (  1.905187)

What are the conditions under which you're benchmarking? and are you sure you're not testing the previous version by mistake?

Freaky commented 10 months ago

Pretty sure - I do print httpx version at the start of the test.

I tried a different machine, with a fresh install of 3.3.0-preview2 and a fresh checkout of the benchmark with just a bundle update:

Environment: FreeBSD 13.2-RELEASE-p3 GENERIC
 Ruby:       3.3.0
 Curb:       8.4.0
 HTTPX:      1.1.0
 Test: echo of 96 bytes
 URI:      http://127.0.0.1:3000/echo
 Requests: 5000
--------------------------------------

Rehearsal -----------------------------------------
httpx  15.870388   1.547504  17.417892 ( 20.691150)
curb    0.363493   0.303831   0.667324 (  1.328791)
------------------------------- total: 18.085216sec

            user     system      total        real
httpx  38.024425   2.279698  40.304123 ( 44.686257)
curb    0.401881   0.223986   0.625867 (  1.045749)

HoneyryderChuck commented 10 months ago

can you paste the benchmark here? also, did you verify if the httpx is using the block version which enables persistence?

Freaky commented 10 months ago

https://github.com/Freaky/ruby-http-duel

No, the problem is specifically with using one-shot connections.

At least, that's what appears to show the problem the best - I reduced the number of requests per benchmark to 5, put them in a wrap {} block, and left it running in a loop. The first runs took about 0.001s, while after 5 minutes they're taking ten times longer, and after 15 minutes they're taking twenty times longer (0.02s).

HoneyryderChuck commented 10 months ago

Do you think you can investigate? I'll be a bit short on time in the coming weeks, and have some backlog already. Fwiw I consider the original issue fixed, considering the "apples to apples" comparison from the benchmark. If there is overhead accrued in the one-shot case, this is a separate issue. For one , the numbers you're presenting above are only for httpx, while the same type of evaluation hasn't been performed for other clients (they're still using a persistent connection).

What I'd be interested in knowing if I had the time is whether this overhead is replicated in any other client (net-http, curb or manticore). And if it's not, I'd also like to know how the number of tcp connections evolve over the loop (you can track that with lsof). If the httpx difference is significant, this may have to do with other clients setting socket flags to reuse resources (I know net-http sets nodelay, others may be setting reuseport, httpx sets none fyi), which you can introspect using dtrace.

A leak in httpx can't be ruled out, so one thing you can do after, or in parallel, is using something like memory_profiler to profile the loop block (don't forget to run gc manually before) and monitor whether a specific data structure is surviving the request lifecycle.

If you're willing to, I'd suggest moving the conversation into the discussions board. Once the problem is narrowed down, I can create tickets.

Freaky commented 10 months ago

Fwiw I consider the original issue fixed, considering the "apples to apples" comparison from the benchmark. If there is overhead accrued in the one-shot case, this is a separate issue.

No. This is the original reported issue. Judging by your MR, the very first thing you did was to change the benchmark to use persistent connections - after which you found issues with timeouts that appear unrelated to the reported problem.

What I'd be interested in knowing if I had the time is whether this overhead is replicated in any other client

I disabled keepalive in both and - while they obviously got slower - they did so consistently.

A leak in httpx can't be ruled out

I already demonstrated :close callbacks piling up, and a patch that appeared to fix it. This went unremarked on, and appears to still be the problem. Here's the updated benchmark with my patch, MRI 3.3.0-preview2:

httpx   1.480269   0.054569   1.534838 (  1.539390)
curb    0.174878   0.080200   0.255078 (  0.286807)
httpx   1.479014   0.062384   1.541398 (  1.544643)
curb    0.172591   0.070618   0.243209 (  0.275432)
httpx   1.417401   0.062980   1.480381 (  1.483941)
curb    0.163956   0.079343   0.243299 (  0.274583)
httpx   1.481548   0.070705   1.552253 (  1.554351)
curb    0.145617   0.101089   0.246706 (  0.278096)

JRuby HEAD:

httpx      25.015625   0.609375  25.625000 (  7.550858)
manticore   2.828125   0.125000   2.953125 (  1.028891)
httpx      16.304688   0.335938  16.640625 (  4.734560)
manticore   3.546875   0.078125   3.625000 (  0.602771)
httpx       5.539062   0.281250   5.820312 (  3.841073)
manticore   0.492188   0.078125   0.570312 (  0.349855)
httpx       3.476562   0.210938   3.687500 (  3.506946)
manticore   0.195312   0.046875   0.242188 (  0.285633)
httpx       3.648438   0.195312   3.843750 (  3.662177)
manticore   0.242188   0.062500   0.304688 (  0.295113)

The tightly looped persistent connection case also appears to be resolved, with performance at the 10 minute mark being about the same as the 1 second mark.

If you're willing to, I'd suggest moving the conversation into the discussions board.

I don't see a benefit to further fragmenting this discussion.

HoneyryderChuck commented 10 months ago

The original report was a benchmark comparing httpx with one connection per request, to other clients using persistent connections, i.e. apples to oranges. I chose oranges, reproduced the degradation, and fixed it. That does not mean that there's not a problem with the apples, but it's a separate issue. But to be honest, I'm not interested in the semantics, we already agree it's an issue.

I already demonstrated :close callbacks piling up...

I see, so you already figured it out :) Apologies if it looked like I dismissed it, sometimes info slips, and afterwards it's hard to go back to comments far up in the thread. Can I suggest that you open a merge request with your patch? It'd provide the space for discussing that separate issue as well. Feel free to do it on github or gitlab repos (if you don't have an account for the latter, you can open one via email, go here and click "Email a new Merge request" to see how)ç

Freaky commented 10 months ago

MR 289 created over on Gitlab. Might as well use my account over there at least once!

HoneyryderChuck commented 2 months ago

@Freaky it took some time, but I was finally able to do the following updates on my http benchmarks:

httpbin using go (and not python)
added jruby tests + manticore
using 5000 requests for all multi-request benchmarks

I know it's been a while and you may have lost interest in the debate, but I kindly invite you to add/update benchmarks as per what you were doing in your own repository, if you see room for improvement.

Freaky commented 1 month ago

@HoneyryderChuck Cheers for the heads-up - this is definitely something I want to revisit when I get the chance!

I'm at @serpapi now and we make a pretty substantial number of HTTP requests from Ruby clients. HTTPX is definitely worth some investigation for future use :)

HoneyryderChuck / httpx

Performance degrading with muliple requests #18