encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.22k stars 839 forks source link

An open stream object is being garbage collected; call "stream.close()" explicitly. #550

Closed JulienPalard closed 4 years ago

JulienPalard commented 4 years ago

Currently, using cpython 3.8 and httpx and HEAD (a05ba2e9148c3f74c80c68a727b7e52b3d751c8c), with this script:

import httpx
print(len(httpx.get(f"https://docs.python.org/fr/3.8").text))

I'm getting:

10671
An open stream object is being garbage collected; call "stream.close()" explicitly.
source_traceback: Object created at (most recent call last):
  File "/home/mdk/clones/httpx/httpx/api.py", line 125, in get
    return request(
  File "/home/mdk/clones/httpx/httpx/api.py", line 84, in request
    return client.request(
  File "/home/mdk/clones/httpx/httpx/client.py", line 848, in request
    return self.send(
  File "/home/mdk/clones/httpx/httpx/client.py", line 887, in send
    async_response = concurrency_backend.run(coroutine, *args, **kwargs)
  File "/home/mdk/clones/httpx/httpx/concurrency/asyncio.py", line 321, in run
    return self.loop.run_until_complete(coroutine(*args, **kwargs))
  File "/home/mdk/.local/lib/python3.8/asyncio/base_events.py", line 595, in run_until_complete
    self.run_forever()
  File "/home/mdk/.local/lib/python3.8/asyncio/base_events.py", line 563, in run_forever
    self._run_once()
  File "/home/mdk/.local/lib/python3.8/asyncio/base_events.py", line 1836, in _run_once
    handle._run()
  File "/home/mdk/.local/lib/python3.8/asyncio/events.py", line 81, in _run
    self._context.run(self._callback, *self._args)
  File "/home/mdk/.local/lib/python3.8/asyncio/streams.py", line 50, in open_connection
    reader = StreamReader(limit=limit, loop=loop)

It looks like two connections are created, the second one during the redirection to /fr/3.8/.

Tried with HTTPX_LOG_LEVEL=trace:

TRACE [2019-11-24 23:31:51] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='docs.python.org' port=443)
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection_pool - new_connection connection=HTTPConnection(origin=Origin(scheme='https' host='docs.python.org' port=443))
TRACE [2019-11-24 23:31:51] httpx.config - load_ssl_context verify=True cert=None trust_env=True http_versions=HTTPVersionConfig(['HTTP/1.1'])
TRACE [2019-11-24 23:31:51] httpx.config - load_verify_locations cafile=/home/mdk/.venvs/3.8/lib/python3.8/site-packages/certifi/cacert.pem
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection - start_connect tcp host='docs.python.org' port=443 timeout=TimeoutConfig(timeout=5.0)
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection - connected http_version='HTTP/1.1'
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - send_headers method='GET' target='/fr/3.8' headers=Headers({'host': 'docs.python.org', 'user-agent': 'python-httpx/0.7.8', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive'})
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=NEED_DATA
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=Response(status_code=301, headers=[(b'connection', b'keep-alive'), (b'content-length', b'178'), (b'server', b'nginx'), (b'content-type', b'text/html'), (b'location', b'https://docs.python.org/fr/3.8/'), (b'x-clacks-overhead', b'GNU Terry Pratchett'), (b'strict-transport-security', b'max-age=315360000; includeSubDomains; preload'), (b'via', b'1.1 varnish'), (b'accept-ranges', b'bytes'), (b'date', b'Sun, 24 Nov 2019 22:31:51 GMT'), (b'via', b'1.1 varnish'), (b'age', b'119188'), (b'x-served-by', b'cache-jfk8149-JFK, cache-cdg20743-CDG'), (b'x-cache', b'HIT, HIT'), (b'x-cache-hits', b'1, 1'), (b'x-timer', b'S1574634712.870413,VS0,VE0')], http_version=b'1.1', reason=b'Moved Permanently')
DEBUG [2019-11-24 23:31:51] httpx.client - HTTP Request: GET https://docs.python.org/fr/3.8 "HTTP/1.1 301 Moved Permanently"
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='docs.python.org' port=443)
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection_pool - new_connection connection=HTTPConnection(origin=Origin(scheme='https' host='docs.python.org' port=443))
TRACE [2019-11-24 23:31:51] httpx.config - load_ssl_context verify=True cert=None trust_env=True http_versions=HTTPVersionConfig(['HTTP/1.1'])
TRACE [2019-11-24 23:31:51] httpx.config - load_verify_locations cafile=/home/mdk/.venvs/3.8/lib/python3.8/site-packages/certifi/cacert.pem
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection - start_connect tcp host='docs.python.org' port=443 timeout=TimeoutConfig(timeout=5.0)
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection - connected http_version='HTTP/1.1'
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - send_headers method='GET' target='/fr/3.8/' headers=Headers({'host': 'docs.python.org', 'user-agent': 'python-httpx/0.7.8', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive'})
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=NEED_DATA
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=Response(status_code=200, headers=[(b'connection', b'keep-alive'), (b'content-length', b'2564'), (b'server', b'nginx'), (b'content-type', b'text/html'), (b'last-modified', b'Sun, 24 Nov 2019 05:48:09 GMT'), (b'etag', b'"5dda1999-29af"'), (b'x-clacks-overhead', b'GNU Terry Pratchett'), (b'strict-transport-security', b'max-age=315360000; includeSubDomains; preload'), (b'content-encoding', b'gzip'), (b'via', b'1.1 varnish'), (b'accept-ranges', b'bytes'), (b'date', b'Sun, 24 Nov 2019 22:31:51 GMT'), (b'via', b'1.1 varnish'), (b'age', b'49946'), (b'x-served-by', b'cache-jfk8146-JFK, cache-cdg20746-CDG'), (b'x-cache', b'HIT, HIT'), (b'x-cache-hits', b'1, 1'), (b'x-timer', b'S1574634712.938068,VS0,VE0'), (b'vary', b'Accept-Encoding')], http_version=b'1.1', reason=b'OK')
DEBUG [2019-11-24 23:31:51] httpx.client - HTTP Request: GET https://docs.python.org/fr/3.8/ "HTTP/1.1 200 OK"
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=NEED_DATA
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=Data(<2564 bytes>)
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - receive_event event=EndOfMessage(headers=[])
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - response_closed our_state=DONE their_state=DONE
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='docs.python.org' port=443))
TRACE [2019-11-24 23:31:51] httpx.dispatch.connection - close_connection
TRACE [2019-11-24 23:31:51] httpx.dispatch.http11 - send_event event=ConnectionClosed()
10671
An open stream object is being garbage collected; call "stream.close()" explicitly.

Looks like the first connection is not released, so it's probably near the redirect code, maybe closing the connection right before loosing the reference to the response may be the fix, but it feels "a bit wrong" to have to care about this in such a remote place, and it feels really wrong to have to close the connection to release it to the pool, it's a keepalive connection it should be kept open and reused.

diff --git a/httpx/middleware/redirect.py b/httpx/middleware/redirect.py
index e093bf4..48342d1 100644
--- a/httpx/middleware/redirect.py
+++ b/httpx/middleware/redirect.py
@@ -38,6 +38,8 @@ class RedirectMiddleware(BaseMiddleware):
         next_request = self.build_redirect_request(request, response)

         if self.allow_redirects:
+            if response.on_close is not None:
+                await response.on_close()
             return await self(next_request, get_response)

         response.call_next = functools.partial(self, next_request, get_response)

FYI if I apply the patch I see:

DEBUG [2019-11-24 23:36:38] httpx.client - HTTP Request: GET https://docs.python.org/fr/3.8 "HTTP/1.1 301 Moved Permanently"
REDIRECT: Releasing
TRACE [2019-11-24 23:36:38] httpx.dispatch.http11 - response_closed our_state=DONE their_state=SEND_BODY
TRACE [2019-11-24 23:36:38] httpx.dispatch.http11 - send_event event=ConnectionClosed()
TRACE [2019-11-24 23:36:38] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='docs.python.org' port=443))
TRACE [2019-11-24 23:36:38] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='docs.python.org' port=443)
TRACE [2019-11-24 23:36:38] httpx.dispatch.connection_pool - new_connection connection=HTTPConnection(origin=Origin(scheme='https' host='docs.python.org' port=443))

Any hint on how to do it more cleanly?

What about an asynchronous context manager to take connection from the pool instead of passing a callback to the release method?

tomchristie commented 4 years ago

This appears resolved in 0.8 to me. Our redirect middleware wasn't closing the redirect response, which we are now doing.

metakot commented 3 years ago

Still an issue on Ubuntu 18.04.5, python 3.8.0 and httpx 0.16.1


import asyncio
import httpx

async def test():
    async with httpx.AsyncClient() as client:
        resp = await client.get('https://httpbin.org/get')

asyncio.run(test())

An open stream object is being garbage collected; call "stream.close()" explicitly.