aio-libs / aiopg

aiopg is a library for accessing a PostgreSQL database from the asyncio
http://aiopg.readthedocs.io
BSD 2-Clause "Simplified" License
1.39k stars 159 forks source link

Make sure connection._ready() does not stall on bad file descriptor #885

Open seamusabshere opened 2 years ago

seamusabshere commented 2 years ago

With @emk

We've observed "[Errno 2] No such file or directory" coming from add_writer(). When this happens, nobody notifies _waiter that there has been an exception and this hangs the program.

We haven't managed to create a local repo - whenever we (for example) kill -9 a postgres process while an aiopg client is writing to it, we get the intended psycopg2.OperationalError("Connection closed").

Here's the backtrace of the error seen in the wild:

Exception in callback Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779 handle: <Handle Connection._ready(<weakref at 0...x7efddf120890>) at /root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py:779> Traceback (most recent call last): [... app code ...] File "/usr/lib/python3.7/asyncio/base_events.py", line 566, in run_until_complete self.run_forever() File "/usr/lib/python3.7/asyncio/base_events.py", line 534, in run_forever self._run_once() File "/usr/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once handle._run() File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, self._args) File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/aiopg/connection.py", line 838, in _ready self._fileno, self._ready, weak_self # type: ignore File "/usr/lib/python3.7/asyncio/selector_events.py", line 334, in add_writer return self._add_writer(fd, callback, args) File "/usr/lib/python3.7/asyncio/selector_events.py", line 294, in _add_writer (reader, handle)) File "/usr/lib/python3.7/selectors.py", line 389, in modify self._selector.modify(key.fd, selector_events) FileNotFoundError: [Errno 2] No such file or directory

What do these changes do?

Are there changes in behavior for the user?

Related issue number

Checklist

codecov[bot] commented 2 years ago

Codecov Report

Merging #885 (c77349e) into master (3b1902c) will decrease coverage by 0.11%. The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #885      +/-   ##
==========================================
- Coverage   93.32%   93.21%   -0.12%     
==========================================
  Files          12       12              
  Lines        1574     1577       +3     
  Branches      187      187              
==========================================
+ Hits         1469     1470       +1     
- Misses         73       75       +2     
  Partials       32       32              
Impacted Files Coverage Δ
aiopg/connection.py 95.34% <60.00%> (-0.33%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3b1902c...c77349e. Read the comment docs.

seamusabshere commented 2 years ago

@asvetlov updated, thanks for your suggestion!

seamusabshere commented 2 years ago

@asvetlov as i said in the description, i've found this to be impossible to test - can we get it merged without?

note that we have been running this in production since the original PR

emk commented 2 years ago

Thank you to everyone for following up on this!

I can confirm that this patch appears to have entirely fixed a bug where aiopg applications would hang indefinitely. We encountered the original bug at least daily (across various machines), and we haven't seen it once that I know of since deploying this fix.

I'm not quite sure how to test it without mocking core Python network code to artificially fail.

Here are the outstanding TODO items: