Closed jchumnanvech closed 9 years ago
Nice catch. It's bug indeed, but of a different kind. The thing is that set_exception
does not preserve tracebacks in Python 2 (see relevant docs).
That's why I use set_exc_info
where it makes sense. So instead of converting to set_exception
, can you try this:
ping_future.set_exc_info(sys.exc_info())
I also strive to cover all bugs with unittests. Can you please write one for this issue? I guess it would be enough to close conn's file descriptor to simulate server restart.
Thanks.
Explored this for a bit and it seems a bit more complicated.
If Connection.ping dies in Pool._operate if will do a Pool.putconn in all cases but one - conn = fut.result()
which can't happen from a _ping_future_connection anyways as connection is passed and fut
is a fake future with a set result.
If a file descriptor is closed another error happens - AssertionError: Tried to release non-busy connection
as Pool.putconn in Pool._ping_future_connection is called on an Ping._operate error that already does Pool.putconn in its own error handling.
Checking how restarts work I found psycopg#196 but restarting the server still doesn't mark the connection as closed properly (at least on osx using psycopg2 2.6.1). Once that happens connects are never recycled.
Connections to a host also fail in retrying as Connection._io_callback fails, retry loop is not triggered as the exception is added to the future and the query fails - this time the connection is correctly marked as closed and is recycled.
Am I missing something?
OK, lets take it one thing in a time. @jchumnanvech original bug is an issue with incorrect API call. It's trivial and should be fixed. But the bug itself surfaced another issue - unittest do not properly verify connectivity issues handling.
Regarding psycopg#196. It was fixed back then when I complained about it and is still working as of psycopg2 2.6.1 on my Ubunutu 14.04. Probably it's OSX specific issue. @friedcell, I suggest you reopen the ticket on psycopg2 tracker.
Now, @friedcell discovered another bug:
On connectivity errors, putconn
is called twice on the same connection and hits assertion error. Need to open another bug on that and write a proper reproduction.
@friedcell, regarding your last paragraph - I fail to understand what you are trying to say (that's probably me :) ). Can you please open another issue for this and we'll continue to analyze it there? (Some code that catches concrete failure scenarios would really help).
I'll open another issue for reconnection issues - I have tests for that but need to do some more testing on Ubuntu.
I don't yet have tests for double putconn
and the set_exc_info
issue...
Looks like server restarts can be simulated pretty well with the following:
>>> import psycopg2
>>> conn = psycopg2.connect("user=foo password=bar host=127.0.0.1 port=5432")
>>> import socket
>>> s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
>>> s.shutdown(socket.SHUT_WR)
>>> cur = conn.cursor(); cur.execute("SELECT 1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
psycopg2.OperationalError: SSL connection has been closed unexpectedly
>>> conn.closed
2
>>>
The real server restart causes the following:
>>> cur = conn.cursor(); cur.execute("SELECT 1")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
psycopg2.OperationalError: terminating connection due to administrator command
SSL connection has been closed unexpectedly
The next step is to incorporate connection killing into unittests, see where it fails and continue from there.
I don't yet have tests for double putconn and the set_exc_info issue...
I think I can tinker it.
I got different errors on OSX - my biggest problem was the fact that psycopg2 does not report the connections as closed. Solved double putconn with a simple if conn in self.conns.busy:
in _ping_future_connection
@friedcell, I finally see what you are saying - the retrial code runs only when cursor.execute
fails synchronously. Unfortunately when server restarts, it fails asynchronously later on in _io_callback
and is propagated straight to the caller.
Something to work on...
Thanks for discovering this.
With async code, shutting socket down for writes, does not set conn.closed
. But shutting it for reads, does the trick. However DatabaseError
except of OperationalError
is thrown. So need to catch both of them for retrials:
from __future__ import print_function
import psycopg2
import select
import socket
import traceback
def wait(conn):
while 1:
state = conn.poll()
if state == psycopg2.extensions.POLL_OK:
break
elif state == psycopg2.extensions.POLL_WRITE:
select.select([], [conn.fileno()], [])
elif state == psycopg2.extensions.POLL_READ:
select.select([conn.fileno()], [], [])
else:
raise psycopg2.OperationalError("poll() returned %s" % state)
aconn = psycopg2.connect(dsn="user=foo password=bar host=127.0.0.1 port=5432", async=1)
wait(aconn)
s = socket.fromfd(aconn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
s.shutdown(socket.SHUT_RD)
try:
acur = aconn.cursor()
acur.execute("SELECT 1")
wait(aconn)
except Exception as err:
print("Execution failed: %s" % err)
traceback.print_exc()
print("aconn.closed = %s" % aconn.closed)
else:
print(acur.fetchone())
And the result is:
$ python asynclose.py
Execution failed: SSL SYSCALL error: EOF detected
Traceback (most recent call last):
File "asynclose.py", line 35, in <module>
wait(aconn)
File "asynclose.py", line 12, in wait
state = conn.poll()
OperationalError: SSL SYSCALL error: EOF detected
aconn.closed = 2
I fixed all of, hopefully, it in my master branch. Guys, can you try this version: https://github.com/haizaar/momoko? Both ping issue and reconnects should work.
At the end, shutting down sockets, did not work - this somehow augments psycopg2
state in the weird way, such that most of the time it works, but one of every 30 test runs fails with wonky error. Spent a lot of time trying to understand whether it's a problem with my code, but conclusion was that it's not. And in fact after incorporating third-party tcp proxy, it all works smooth.
If you run tests, make sure to run make -C tcproxy
first.
Guys, does anyone has any experience with Travis CI? for some reason my proxy tests fail when run on Travis - can't force psycopg2 into connecting through TCP to my proxy.
Here is the failing build - https://travis-ci.org/haizaar/momoko/builds/77378307 (pulls from my master).
Travis fixed. I've merged the code to master. Feedback is welcome.
Tested in all scenarios that were failing for me before - works perfectly now. Sorry for the delay, fell into a zone where normal things don't happen very often.
Excellent. I'll release 2.2.0 later this week. On Sep 1, 2015 21:38, "Marko Mrdjenovic" notifications@github.com wrote:
Tested in all scenarios that were failing for me before - works perfectly now. Sorry for the delay, fell into a zone where normal things don't happen very often.
— Reply to this email directly or view it on GitHub https://github.com/FSX/momoko/pull/115#issuecomment-136823072.
Right now we are encountering a bug when someone resets the postgres-db service. The connections become stale. When psycopg2 Error is thrown set_exc_info with error throws another exception which causes the connection to not be put back in the connection pool (line 560 is not executed because line 559 blows up). Its in the the tornado concurrent exception traceback logger.
It is trying to pass in the error into
self.formatted_tb = traceback.format_exception(*exc_info) and exc_info is only 1 parameter and it excepts 3 This is in concurrent.py line 113
Changing it to set exception seems to make it work.