aio-libs / aiopg

aiopg is a library for accessing a PostgreSQL database from the asyncio
http://aiopg.readthedocs.io
BSD 2-Clause "Simplified" License
1.39k stars 159 forks source link

FileNotFoundError when connecting to postgres if fd is closed and then reopened #837

Open brianmaissy opened 3 years ago

brianmaissy commented 3 years ago

I recently upgrade from python 3.6 to 3.8, and encountered a strange bug:

  File ".../venv/lib/python3.8/site-packages/aiopg/connection.py", line 151, in _ready
    self._loop.add_writer(self._fileno, self._ready, weak_self)
  File "/usr/lib/python3.8/asyncio/selector_events.py", line 337, in add_writer
    return self._add_writer(fd, callback, *args)
  File "/usr/lib/python3.8/asyncio/selector_events.py", line 296, in _add_writer
    self._selector.modify(fd, mask | selectors.EVENT_WRITE,
  File "/usr/lib/python3.8/selectors.py", line 389, in modify
    self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory

The crash occurred when connecting to a database with await aiopg.sa.create_engine(...). I was only able to reproduce it under certain circumstances: for example if I disabled SSL it would not happen, and if I used a .pgpass file rather than passing a password to create_engine() it would not happen.

Here's what the strace looked like at the time of the crash:

socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 10
setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(10, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
fcntl(10, F_SETFD, FD_CLOEXEC)          = 0
setsockopt(10, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(10, {sa_family=AF_INET, sin_port=htons(5432), sin_addr=inet_addr("X.X.X.X")}, 16) = -1 EINPROGRESS (Operation now in progress)
epoll_ctl(6, EPOLL_CTL_MOD, 10, {EPOLLIN|EPOLLOUT, {u32=10, u64=10}}) = -1 ENOENT (No such file or directory)

I discovered that what was happening was that under certain conditions, libpq will close and then reopen the socket, such that the fd underlying the aiopg connection is a new socket but has the same fd number. Turns out that in the documentation they have a disclaimer that they are allowed to do this:

Use PQsocket(conn) to obtain the descriptor of the socket underlying the database connection. (Caution: do not assume that the socket remains the same across PQconnectPoll calls.)

In python 3.6, the implementation of _PollLikeSelector.modify was to call unregister() and then register(). In python 3.7 they added a patch which changed the implementation: now it uses epoll.modify(). Whereas before, if we had replaced the socket with a new one with the same fd number, the unregister/register would still work, but now that it's a different socket, the modify causes us to do an EPOLL_CTL_MOD before EPOLL_CTL_ADD, returning ENOENT.

The bottom line is that libpq thinks that it's ok to replace the socket silently, and python doesn't. It seems that the best place to resolve this contradiction might be in aiopg. A possible workaround might be to detect that the socket has been replaced, and to remove the fd from the event loop and re-add it.

byrgazov commented 3 years ago

I have a similar situation. Found on Arch Linux with PostgreSQL stopped (not installed).

# -*- coding: utf-8 -*-

__requires__ = ['aiopg[sa]==1.2.1', 'cffi==1.14.5']

import sys, os
import asyncio
import aiopg.sa
import aiopg.connection

# PostgreSQL server does not running!
DATABASE_URL = 'postgresql://localhost:12345/fake'

print('* uname:', ' '.join(os.uname()))
print('* python:', sys.version.replace('\n', ' ').replace('  ', ' ').strip())
print('* asiopg:', aiopg.version.replace('\n', ' ').replace('  ', ' ').strip())
print('* psycopg2:', aiopg.connection.psycopg2.__version__)

try:
    import cffi
except ImportError:
    print('* glibc: CFFI required!')
else:
    ffi = cffi.FFI()
    ffi.cdef('const char *gnu_get_libc_version(void);')
    C = ffi.dlopen(None)
    print('* glibc:', ffi.string(C.gnu_get_libc_version()).decode())

async def test(url):
    print('Connection:', url)
    async with aiopg.sa.create_engine(url) as engine:
        print('Engine:', engine)

loop = asyncio.get_event_loop()
loop.run_until_complete(test(DATABASE_URL))

As I understand it, the problem occurs on Python> = 3.7 and Glibc 2.33. Arch Linux

* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.9.5 (default, May 24 2021, 12:50:35) [GCC 11.1.0]
* asiopg: 1.2.1, Python 3.9.5 (default, May 24 2021, 12:50:35) [GCC 11.1.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
Connection: postgresql://localhost:12345/fake
Traceback (most recent call last):
  File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 104, in _ready
    state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 12345?
could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
  File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 124, in _ready
    self._loop.remove_writer(self._fileno)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 351, in remove_writer
    return self._remove_writer(fd)
  File "/usr/lib/python3.9/asyncio/selector_events.py", line 325, in _remove_writer
    self._selector.modify(fd, mask, (reader, None))
  File "/usr/lib/python3.9/selectors.py", line 390, in modify
    self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory
* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.7.10 (default, May 14 2021, 23:54:07) [GCC 10.2.0]
* asiopg: 1.2.1, Python 3.7.10 (default, May 14 2021, 23:54:07) [GCC 10.2.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
...
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?
...
FileNotFoundError: [Errno 2] No such file or directory
* uname: Linux a13 5.12.9-arch1-1 #1 SMP PREEMPT Thu, 03 Jun 2021 11:36:13 +0000 x86_64
* python: 3.6.13 (default, Jun 7 2021, 17:51:57) [GCC 11.1.0]
* asiopg: 1.2.1, Python 3.6.13 (default, Jun 7 2021, 17:51:57) [GCC 11.1.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.33
...
  File "/home/bw/src/.venv-3.6/lib/python3.6/site-packages/aiopg/connection.py", line 104, in _ready
    state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
        Is the server running on host "localhost" (::1) and accepting
        TCP/IP connections on port 12345?
could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?

Devuan GNU/Linux 3 (beowulf)

* uname: Linux d69 5.3.18-lp152.57-default #1 SMP Fri Dec 4 07:27:58 UTC 2020 (7be5551) x86_64
* python: 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0]
* asiopg: 1.2.1, Python 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.28
...
  File "/home/bw/.local/lib/python3.7/site-packages/aiopg/connection.py", line 104, in _ready
    state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?

CRUX 3.6.1

* uname: Linux crux 5.4.80-bw2 #2 SMP Sat Jun 5 18:11:50 UTC 2021 x86_64
* python: 3.9.0 (default, Dec 6 2020, 03:55:43) [GCC 10.2.0]
* asiopg: 1.2.1, Python 3.9.0 (default, Dec 6 2020, 03:55:43) [GCC 10.2.0]
* psycopg2: 2.8.6 (dt dec pq3 ext lo64)
* glibc: 2.32
...
  File "/home/bw/.local/lib/python3.9/site-packages/aiopg/connection.py", line 104, in _ready
    state = self._conn.poll()
psycopg2.OperationalError: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 12345?

Etc.

and-semakin commented 3 years ago

Some kind of workaround is to monkey patch selectors 🐒

# This import should be at the top of the file because we need to apply monkey patch
# before executing any other code.
# We want to revert this change: https://github.com/python/cpython/pull/1030
# Additional context is here: https://github.com/aio-libs/aiopg/issues/837
import selectors  # isort:skip # noqa: F401

selectors._PollLikeSelector.modify = (  # type: ignore
    selectors._BaseSelectorImpl.modify  # type: ignore
)  # noqa: E402
arssher commented 2 years ago

FWIW I also bumped into this and while experimenting wrote draft patch that works at https://github.com/arssher/aiopg/tree/handle_changed_socket But it is not decent enough to be proposed as PR.

valderman commented 2 years ago

Any progress on this? This completely breaks support for hot standby scenarios.

Reliably reproducible by setting up localhost:5431 as a read-only replica of localhost:5432 and using the following minimal example:

import aiopg
import asyncio

connstr = 'postgres://postgres@localhost:5431,localhost:5432/?target_session_attrs=primary'

async def go():
    await aiopg.connect(connstr)

asyncio.run(go())
carlosribas commented 1 year ago

I was getting this same error:

    self._context.run(self._callback, *self._args)
  File "/srv/reference/venv/lib/python3.9/site-packages/aiopg/connection.py", line 837, in _ready
    self._loop.add_writer(
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 341, in add_writer
    self._add_writer(fd, callback, *args)
  File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 299, in _add_writer
    self._selector.modify(fd, mask | selectors.EVENT_WRITE,
  File "/usr/local/lib/python3.9/selectors.py", line 390, in modify
    self._selector.modify(key.fd, selector_events)
FileNotFoundError: [Errno 2] No such file or directory

as the first option on google for this error is this issue, I thought it best to comment here what my error was, as it may help others. I had changed the DB password, but I forgot to change my .env file.

As the error message says FileNotFoundError, it took me a few hours to realize that I was trying to connect with a wrong password

pablodcar commented 1 year ago

Any progress on this? This completely breaks support for hot standby scenarios.

Reliably reproducible by setting up localhost:5431 as a read-only replica of localhost:5432 and using the following minimal example:

import aiopg
import asyncio

connstr = 'postgres://postgres@localhost:5431,localhost:5432/?target_session_attrs=primary'

async def go():
    await aiopg.connect(connstr)

asyncio.run(go())

@valderman Have you found any solution to this? failover does not work for me as well.