geventhttpclient / geventhttpclient

A high performance, concurrent http client library for python with gevent
Other
548 stars 135 forks source link

OpenSSL cert loading performance issue #219

Closed cyberw closed 5 months ago

cyberw commented 5 months ago

Python 3.12 / OpenSSL 3.0 has issues on windows, which was reported for Locust: https://github.com/locustio/locust/issues/2555 as well as for CPython directly https://github.com/python/cpython/issues/95031 (and the root issue is probably in openssl https://github.com/openssl/openssl/issues/18814)

It isnt specific to geventhttpclient, but may require a workaround on our side, maybe something along the lines suggested here for requests: https://github.com/psf/requests/pull/6667

ml31415 commented 5 months ago

I'm not sure if this applies to us. If I read it correctly, the SSL context creation is the thing that can take some time. But we're already creating the context on a per-pool basis and not per-connection. https://github.com/geventhttpclient/geventhttpclient/blob/50a9482edda5216929ac2ad09c1e7f9062372b2c/geventhttpclient/connectionpool.py#L248

I don't know how the mentioned "50 peak users" are constructed. But if all of them share one UserAgent and therefore one HTTPClient and its connection pool, there should be no problem. If every user gets its own UserAgent, well than you might have to fix that for Locust.

cyberw commented 5 months ago

Well... We kind of want it to be realistic (one ssl handshake, one http connections per locust User). Here's a slimmed down example translated from the one in the requests-PR to gevenhttpclient:

import gevent

gevent.monkey.patch_all()
from time import time

from geventhttpclient.client import HTTPClientPool

def do_request() -> None:
    HTTPClientPool().get_client("https://github.com")

pool = gevent.pool.Pool()
for i in range(30):
    pool.spawn(do_request)
start = time()
pool.join()
end = time()

print(end - start)
cyberw commented 5 months ago

The above code takes 0.2s on Python 3.11 and takes around 7s on 3.12 (on windows). The root issue probably isnt geventhttpclient's fault, but perhaps it can be optimized somehow...

On macos (and probably linux) the difference is not as bad, going from 0.1s to 0.2s

ml31415 commented 5 months ago

Yeah, but again: geventhttpclient is doing its best to reduce overhead by having only one context per host per session. It would be a very weird thing to have some kind of global dictionary as a kind of host->context cache.

The most reasonable thing, that comes to mind would be some kind of shallow copy of a HTTPClient and UserAgent. One which copies an agent including the hosts of a client pool and their context, but without the open sockets. Would be a rather weird thing to have, and you might just have that in Locust instead of in geventhttpclient.

cyberw commented 5 months ago

I can try implementing that. Thanks. Ok to close if you want.