Open inikolaev opened 5 months ago
I have realised that there are now two timeouts - somehow I missed that and noticed just now when I started looking into that again.
The new timeout now seems to work on the whole connection attempt, while new command_timeout
is set for individual host, but it's still confusing as docs mentions that it is a timeout for an operation and I take it means for any SQL statement as well? How does it relate to statement_timeout
that can be set for a session?
I feel like I would still like to be able to distinguish between actual connect timeout per host and operation timeout.
I think the same or similar issue was already raised before, but it was closed so I decided to create a new one and provide a bit more details and reproducers.
I opened a discussion about this almost a year ago which has some details as well, but I never followed up. I would really love to get some attention to this now as we'll go through the same database upgrade steps again soon and I would like to avoid patching the library locally.
In a nutshell, the issue is that the existing timeout does not work well in certain conditions with multi-host setup: when the first host can be resolved, as in given it's DNS name, one can get IP address, but the host is nevertheless unreachable and connection cannot be established. In this case the whole timeout might be used up waiting for a connection to the first host and there would never be an attempt to connect to the second host.
In the linked ticket I described one case when this actually works by configuring a fairly large timeout of 80 seconds. In this case the first connection attempt times out after 75 seconds and there's still 5 more seconds to attempt to connect to the second host. After some googling I found that these 75 seconds seem to be TCP SYN wait-timer default.
I have made a reproducer that tests both cases and does not require actual database:
I have a similar test that uses
psycopg2
(though it uses SQLAlchemy, but I can rewrite it with purepsycopg2
if needed) that demonstrates how this case is handled there:This case is slightly different, because I have specified both host as unreachable for the same of demonstrating that the overall time it takes to attempt a connect is twice the timeout specified. This means that
psycopg2
applies the same timeout when connecting to each host, not an overall timeout.Whether this is desirable or not is hard to say, I can imagine having an overall connection timeout might be a good thing as well, so maybe we could have two timeouts.
I'm happy to help with the PR to resolve this once it's clear how we want to have this fixed.