Closed fake-name closed 7 years ago
I just managed to replicate the lock-up with 100% cpu again. I think this might be at least marginally repeatable.
Looks like it's waiting on a Channel.CloseOk frame , I'll try and reproduce.
Thanks for the example, I am able to replicate using your example and am cleaning that up.
What I'm seeing with some cleaned up behavior for cross-thread and object synchronization for remote connection closing:
Item: bytearray(b'test?')
DEBUG:rabbitpy.base:Writing frame: Basic.Ack
DEBUG:rabbitpy.base:Waiting on <class 'pamqp.specification.Basic.Deliver'> frame(s)
DEBUG:rabbitpy.channel0:Received frame: 'Connection.Close'
WARNING:rabbitpy.channel0:RabbitMQ closed the connection (320): b'CONNECTION_FORCED - Closed via management plugin'
DEBUG:rabbitpy.base:Channel0 setting state to 'Closed'
DEBUG:rabbitpy.connection:State: 'Open'
DEBUG:rabbitpy.base:Connection setting state to 'Closing'
DEBUG:rabbitpy.base:IO setting state to 'Closing'
DEBUG:rabbitpy.events:Event is already set: Socket Closed
DEBUG:rabbitpy.base:IO setting state to 'Closed'
DEBUG:rabbitpy.base:Connection setting state to 'Closed'
DEBUG:rabbitpy.io:Exiting due to closed socket
DEBUG:rabbitpy.io:Exiting IOLoop.run
DEBUG:rabbitpy.io:Exiting IO.run
Putting message
DEBUG:rabbitpy.base:Interrupting the wait on frame
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "example.py", line 60, in fill_queue
rabbitpy.Message(self.channel, body_value="test?").publish(exchange="test_resp_enchange.e", routing_key="test_resp_queue")
File "/Users/gmr/Source/rabbitpy/rabbitpy/message.py", line 284, in publish
self.channel.write_frames(frames)
File "/Users/gmr/Source/rabbitpy/rabbitpy/base.py", line 261, in write_frames
if self._can_write():
File "/Users/gmr/Source/rabbitpy/rabbitpy/base.py", line 283, in _can_write
raise exceptions.ConnectionNotOpen()
rabbitpy.exceptions.ConnectionNotOpen: The connection is not open
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "example.py", line 51, in get_rx
for item in self.in_q:
File "/Users/gmr/Source/rabbitpy/rabbitpy/amqp_queue.py", line 195, in consume
message = self.channel._consume_message()
File "/Users/gmr/Source/rabbitpy/rabbitpy/channel.py", line 311, in _consume_message
frame_value = self._wait_on_frame([spec.Basic.Deliver])
File "/Users/gmr/Source/rabbitpy/rabbitpy/base.py", line 467, in _wait_on_frame
self._check_for_exceptions()
File "/Users/gmr/Source/rabbitpy/rabbitpy/base.py", line 309, in _check_for_exceptions
raise exception
rabbitpy.exceptions.AMQPConnectionForced: b'CONNECTION_FORCED - Closed via management plugin'
Traceback (most recent call last):
File "example.py", line 87, in <module>
test()
File "example.py", line 81, in test
tester.interrupt()
File "example.py", line 64, in interrupt
self.in_q.stop_consuming()
File "/Users/gmr/Source/rabbitpy/rabbitpy/amqp_queue.py", line 324, in stop_consuming
raise exceptions.NotConsumingError()
rabbitpy.exceptions.NotConsumingError: No active consumer to cancel
I see that periodically on teardown too, though something in my logic changes has fixed the wedging..
At this point, I've kind of papered over the issue with my "kill the connection with prejudice" solution, and then just wrapped the entire mess in retry logic. It's not great, but it's a workaround.
Sorry I wasn't able to provide a minimal example. Everything in my project is kind of messily smeared across a number of files. I really should refactor at some point, before I go fully insane.
I'm going to release 1.0 shortly -- given the age of the ticket, I don't expect you to test this right away. Do you want to test this before I close the ticket?
I should be able to merge it into my branch this weekend, but I'm not sure if I can really replicate without unwrapping a lot of my additions.
If you're satisfied it's fixed, go ahead and close. I'll just comment if I see anything.
Ok, thanks
I'm stress-testing my reconnect logic, and I appear to have managed to get the interface wedged in the rabbitpy somewhere.
Basically, the steps were as followed:
interface.close()
on the my connection manager.rabbitpy.Queue
instances, arabbitpy.Channel
instance, and the baserabbitpy.Connection
instance), calling<instance>.close()
on each, in the reverse order they were constructed.xxx.close()
calls wedges.I can get a remote traceback from the wedged thread using pystuck. The thread is doing something, as the traceback is changing, and it's consuming 100% CPU. Here are some of the sniffed tracebacks:
Note that this is with a local copy of
rabbitpy
in my project's directory, because I wanted to be able to poke around and see if I could debug the issues (related to https://github.com/gmr/rabbitpy/issues/101).