PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.07k stars 1.57k forks source link

Connection leak in the test suite (from `pluggy>=1.4.0` warnings) #11820

Closed chrisguidry closed 8 months ago

chrisguidry commented 8 months ago
tests/server/api/test_block_documents.py::TestCreateBlockDocument::test_create_block_document_with_nonstandard_characters[my-block] - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: unraisableexception, Hook: pytest_runtest_setup
PytestUnraisableExceptionWarning: Exception ignored in: <function Connection.__del__ at 0x7fe03fc0d4e0>

This has been popping up in random tests since last week, and it is because pluggy is giving us a new warning: https://github.com/pytest-dev/pluggy/compare/1.3.0...1.4.0

- `#463 <https://github.com/pytest-dev/pluggy/issues/463>`_: A warning :class:`~pluggy.PluggyTeardownRaisedWarning` is now issued when an old-style hookwrapper raises an exception during teardown.
  See the warning documentation for more details.

I'll be temporarily suppressing that warning, and this issue can be for tracking down and fixing that leaked connection.

It appears that the error is always in python 3.8 and always with the postgres DB backend, which may be a hint.

Examples of runs that failed with this error:

Job ID: 21162420883
Job Name: python:3.8, postgres:14, --exclude-services
Started at: 2024-02-02T17:01:33Z
Link: https://github.com/PrefectHQ/prefect/actions/runs/7759102494/job/21162420883
2024-02-02T17:11:01.8665082Z FAILED tests/server/test_app.py::test_app_exposes_ui_settings - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Job ID: 21156157922
Job Name: python:3.8, postgres:13, --exclude-services
Started at: 2024-02-02T14:18:43Z
Link: https://github.com/PrefectHQ/prefect/actions/runs/7757198583/job/21156157922
2024-02-02T14:28:17.5462675Z FAILED tests/server/test_app.py::test_app_exposes_ui_settings - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Job ID: 21112206170
Job Name: pydantic v1, python:3.8, postgres:14, --exclude-services
Started at: 2024-02-01T14:17:16Z
Link: https://github.com/PrefectHQ/prefect/actions/runs/7742635501/job/21112206170
2024-02-01T14:26:38.8923204Z FAILED tests/server/test_app.py::test_app_exposes_ui_settings - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Job ID: 20991318569
Job Name: python:3.8, postgres:13, --exclude-services
Started at: 2024-01-29T21:25:03Z
Link: https://github.com/PrefectHQ/prefect/actions/runs/7702627286/job/20991318569
2024-01-29T21:34:31.4565524Z FAILED tests/server/test_app.py::test_app_exposes_ui_settings - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Job ID: 20874540746
Job Name: pydantic v1, python:3.8, postgres:14, --exclude-services
Started at: 2024-01-25T19:23:40Z
Link: https://github.com/PrefectHQ/prefect/actions/runs/7659435084/job/20874540746
2024-01-25T19:33:08.7407781Z FAILED tests/server/test_app.py::test_app_exposes_ui_settings - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
chrisguidry commented 8 months ago

Thinking now that @abrookins may have fixed this, going to try again with pluggy unpinned

chrisguidry commented 8 months ago

https://github.com/PrefectHQ/prefect/actions/runs/7817232549/job/21324622376?pr=11916

Some better output from a recent failure:

ERROR tests/input/test_run_input.py::test_receive_works_sync - pluggy.PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
Plugin: unraisableexception, Hook: pytest_runtest_setup
PytestUnraisableExceptionWarning: Exception ignored in: <function _ConnectionRecord.checkout.<locals>.<lambda> at 0x7fd7dbd8b7e0>

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.1/x64/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 732, in <lambda>
    lambda ref: _finalize_fairy(
                ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.1/x64/lib/python3.12/site-packages/sqlalchemy/pool/base.py", line 1034, in _finalize_fairy
    util.warn(message)
  File "/opt/hostedtoolcache/Python/3.12.1/x64/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 1822, in warn
    _warnings_warn(msg, exc.SAWarning)
  File "/opt/hostedtoolcache/Python/3.12.1/x64/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py", line 1897, in _warnings_warn
    warnings.warn(message, category, stacklevel=stacklevel + 1)
sqlalchemy.exc.SAWarning: The garbage collector is trying to clean up non-checked-in connection <AdaptedConnection <asyncpg.connection.Connection object at 0x7fd8245a8f50>>, which will be terminated.  Please ensure that SQLAlchemy pooled connections are returned to the pool explicitly, either by calling ``close()`` or by using appropriate context managers to manage their lifecycle.

For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
chrisguidry commented 8 months ago

Even after closing out remaining engines at the end of the test suite, there is still at least one open connection to the DB as the test session exits.

https://github.com/PrefectHQ/prefect/actions/runs/7829266132/attempts/1

ERROR tests/runner/test_webserver.py::TestWebserverSettings::test_webserver_settings_are_respected - pytest.PytestUnraisableExceptionWarning: Exception ignored in: <socket.socket fd=-1, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6>

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/abc.py", line 102, in __subclasscheck__
    return _abc_subclasscheck(cls, subclass)
ResourceWarning: unclosed <socket.socket fd=27, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6, laddr=('::1', 49296, 0, 0), raddr=('::1', 5432, 0, 0)>
chrisguidry commented 8 months ago

I think the new approach will require instrumenting the connection pool to track all checkouts and their corresponding checkins, so that we can close any remaining connections. As a start, attach to the relevant checkin/out events and capture the connection and also the traceback of where it was checked out. This may help find the ultimate source of this bug, but otherwise we can use this tracking to close all remaining open connections.