Open thanatos opened 7 months ago
since at this point I am not sure what Cockroach could have done to have acted differently from vanilla PG.
I can replicate this with vanilla Postgres & aiopg
, so Cockroach can be removed from play here.
The difference is that PG listens, by default, to "localhost
", and will bind to ::1
on that interface.
::1
is the first address aiopg
will attempt, and the connection is made.
My CockroachDB instance, OTOH, is bound to 127.0.0.1
specifically. aiopg
appears to try ::1
first (same as above with PG); this connection fails, as one would expect. aiopg
correctly goes on to try 127.0.0.1
, and that connection is made successfully, but aiopg
hangs for some reason.
We can replicate this with vanilla PG by binding to 127.0.0.1
explicitly, like Cockroach was:
(in postgresql.conf
)
listen_addresses = '127.0.0.1'
And then some slight tweaks to the example:
Now knowing this, we can get back to Cockroach: I had been passing --listen-addr localhost
; this (apparently; this seems to be a bunch in Cockroach) causes it to bind (oddly) to 127.0.0.1
. If I explicitly tell it to bind to --listen-addr '[::1]'
, then it binds to ::1
, and aiopg
doesn't have to try to fall back, so we avoid the bug.
But it's "falling back to attempting to connect over IPv4" that is the critical bit, not Cockroach. And we hit the same issue, now with PG.
My working theory on this issue is that aiopg
isn't managing the async events it is waiting on properly.
We need to call <pg conn>.poll()
, during which psycopg2
will evaluate the state of the socket; it returns a value indicating what event we need to wait on.
I think the "problem" is that if not self._writing
. Normally, we're attempting to make sure that we've not already got a loop.add_writer
call pending on this socket. In that regard, the if
is sensible. What I think is happening is that during poll()
, PG is going to close & reopen the socket — to switch to an IPv4 connection — and when it does this, we lose our add_writer
callback. (I assume b/c underneath, it's just configuring an epollfd, and that epollfd knows the socket is closed and drops the events.) However, the if
prevents us from re-establishing the handler.
I added some debugging to see when _ready
is invoked, and the actual state of the socket before/after poll, and what we decide to then do in this iteration of _ready
:
ready(); from _poll
sock ( me) = ('::1', 48080, 0, 0)
ready(): poll ok
fileno, us = 6, inner = 6
sock ( me) = ('::1', 48080, 0, 0)
ready(): POLL_WRITE
ready(): POLL_WRITE -- doing add_writer
ready(); add_reader
sock ( me) = ('::1', 48080, 0, 0)
ready(): poll ok
fileno, us = 6, inner = 6
sock (peer) = ('127.0.0.1', 5432)
sock ( me) = ('127.0.0.1', 48760)
ready(): POLL_WRITE
ready(); add_writer … ('::1', 48080, 0, 0)
sock ( me) = ('127.0.0.1', 48760)
ready(): poll ok
fileno, us = 6, inner = 6
sock (peer) = ('127.0.0.1', 5432)
sock ( me) = ('127.0.0.1', 48760)
ready(): POLL_WRITE
I'm not 100% convinced of this yet. Even if it's correct … boy do I not know how we're to fix it! It seems like the interface w/ PG doesn't give us the info we need to know that the FD has been swapped out. It's all good and fine if you're using select(2)
, like psycopg2
's docs use in their examples, but if you're using epoll(2)
… I worry you need to know that the socket has been recreated & your epoll triggers need to be recreated. And asyncio
is going to be using epoll(2)
.
Describe the bug
While attempting to connect to CockroachDB,
aiopg
raisesTimeoutError
.To Reproduce
localhost
. I am running v22.2.19 on Linux.Expected behavior
The example exits successfully.
Logs/tracebacks
Python Version
aiopg Version
OS
Arch Linux
Additional context
Cockroach is a distributed database that uses the PostgreSQL wire protocol for queries.
psql
is capable of connecting to it, and executing queries. (So ispsycopg2
.)When running the example, there is a long pause after
Creating test connection.
, approximately 60s.aiopg
worked a few CockroachDB versions prior to this one, but I am sorry, I did not record the working version's number.A packet captures shows
aiopg
makes the connection (the 3-way handshake happens) but no data is ever transmitted. This is a bit surprising, since at this point I am not sure what Cockroach could have done to have acted differently from vanilla PG.Code of Conduct