gmr / rabbitpy

A pure python, thread-safe, minimalistic and pythonic RabbitMQ client library
http://rabbitpy.readthedocs.org
BSD 3-Clause "New" or "Revised" License
242 stars 58 forks source link

Getting stuck in connection.py infinite loop #20

Closed brimcfadden closed 10 years ago

brimcfadden commented 10 years ago

This problem is somewhat niche and has been difficult for me to gather information about, but it has been causing a headache.

Summary and question about resolution:

A while loop in connection.Connection._connect has been seen to never exit due to behavior of other code through which Rabbitpy is making its connection. The connection fails silently, causing neither of the events which could leave the loop.

Are there plans to implement a connection attempt timeout with a reconnection strategy?

Further details:

I'm going to talk about our use of Paramiko, which you obviously shouldn't need to care about, but it establishes the problem less ambiguously, specifying as much as I know.

Due to technical limitations, we use SSH tunnels to connect some clients' machines & processes with our own servers. We often use Paramiko to establish the SSH tunnels in Python, which demonstrates such a thing in their file forward.py.

For reasons still unknown to me, sometimes Paramiko cannot create a connection properly; it creates a connection which seems successful, but is ephemeral. When this happens, it logs the detection of EOF almost instantly, which is the same condition for ending a successful connection in a normal fashion. This closes the channel inside of the tunnel. When Paramiko receives a message destined for this now-closed channel, it inexplicably ignores it.

Paramiko prematurely closing its channels may very well be a bug in its code, as I can only reproduce it on Windows (and could not on Linux nor OS X). However, I still feel like the Rabbitpy Connection object should realize that there is a problem with the current connection attempt if self._channel0.open is False after a (parameter) timeout.

gmr commented 10 years ago

Is it trivial to see if you use the connection via a context manager if the issue goes away?

with rabbitpy.Connection() as connection:

The ContextManager is cleaning up things that connection = rabbitpy.Connection() does not.

Not saying you're incorrect in your assessment, I am looking at how to clean up that bit of code.

brimcfadden commented 10 years ago

It wouldn't be trivial to try out, unfortunately, since I'm using the "core API" instead of the simple API.

However, I am not sure that the context manager would do anything extra. Connection.__enter__ simply returns self, and Connection.__exit__ simply calls Connection.close, which is something that I do manually. Furthermore, since the infinite loop is within Connection._connect, I wouldn't expect with rabbitpy.Connection() as connection: to ever finish. Recall, there is no exception occurring within the loop. I don't think that this is an issue about connection cleanup, as the object is never fully created in the first place. Is there something that I don't understand with your suggestion?

brimcfadden commented 10 years ago

This particular issue is no longer a problem for me, since we're using OpenSSH instead of Paramiko now. Although it seems like a Paramiko bug, I still suggest implementing some sort of connection timeout so that the code doesn't get stuck in an infinite loop.

sunbit commented 10 years ago

I have a similar issue being looped forever in a while in rabbitpy connection _connect method

while self.opening and not self._events.is_set(events.SOCKET_OPENED):

This happens when executing the code that uses rabbitpy on top of gevent. (gunicorn with gevent worker) I noticed that the patch that makes this loop forever and do nothing is the socket patching. if you apply all gevent patches except the socket one it works ok.

I tried both with the context manager approach and the standalone rabbitpy.Connection() one.

gmr commented 10 years ago

I have a gevent pull-request (#26) pending that should fix what you're seeing @sunbit. I just need to do some regression testing against it to make sure it does not have any adverse impacts.

sunbit commented 10 years ago

Happy to hear that :) Any idea when it will be ready? Maybe i can help, just ask :)

gmr commented 10 years ago

I'll look to get it out today.

gmr commented 10 years ago

This is merged into master, I'm going to do a little bit more and release 0.16.

gmr commented 10 years ago

I think I've addressed the core issue that this ticket was running into, which had to do with exceptions being raised in the wrong thread. I've reworked how the IO thread interacts with channels which addresses the lockout, unable to exit the python process problem.