alanxz / rabbitmq-c

RabbitMQ C client
MIT License
1.77k stars 672 forks source link

amqp_channel_close (0.9.0) hangs sometimes #714

Open qiulang opened 3 years ago

qiulang commented 3 years ago

We use 0.9.0, connecting to 3.7.5 server. We have found that amqp_channel_close hangs forever sometimes.

The reason we call amqp_channel_close is that we find sometimes a specific channel is dead, i.e. it does not receive message anymore. We have not figured out why it is dead but we design a recovery algorithm for that like these:

  1. When we suspect a channel is dead we then send one more "ping" message
  2. If we can't receive that ping message in a period of time we mark that channel dead. So we will close it and open another channel.
  3. We call amqp_channel_close to close that channel. But unfortunately calling amqp_channel_close will hang forever sometimes, which makes our recovery algorithm fail to work.

So any suggestion why amqp_channel_close hangs or how to improve our recovery algorithm ?

Below is some log for hang we experienced

Thread 32 (Thread 0x7fbdd77fe700 (LWP 453)):
#0  0x00007fbdebc2a913 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000004762a7 in amqp_poll (deadline=..., event=<optimized out>, fd=<optimized out>) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:192
#2  recv_with_timeout (state=0x7fbdbc00b310, timeout=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:699
#3  0x0000000000476419 in wait_frame_inner (state=0x7fbdbc00b310, decoded_frame=0x7fbdd77fdb20, timeout_deadline=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:813
#4  0x000000000047658a in simple_rpc_inner (state=0x7fbdbc00b310, channel=1, request_id=<optimized out>, expected_reply_ids=0x7fbdd77fdc60, decoded_request_method=<optimized out>, deadline=...) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:1055
#5  0x0000000000477b41 in amqp_simple_rpc (state=0x7fbdbc00b310, channel=1, request_id=1310760, expected_reply_ids=0x7fbdd77fdc60, decoded_request_method=0x7fbdd77fdc40) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_socket.c:1132
#6  0x0000000000474b68 in amqp_channel_close (state=0x7fbdbc00b310, channel=1, code=<optimized out>) at /var/jenkins/workspace/Newcc-Modules-Patch-Build-Ubuntu/source/lib/third_party/librabbitmq-0.9.0/librabbitmq/amqp_api.c:295
#7  0x0000000000429bd3 in mq_close (mq_handler=0x7fbdbc0008c0) at ../src/mq_interface.c:699
#8  0x0000000000429d77 in mq_destroy (mq_handler=0x7fbdbc0008c0) at ../src/mq_interface.c:724
#9  0x000000000040ef18 in ccp_alispe_thread_proc (arg=0x2376f68) at ../src/ccp_core.c:1094
#10 0x000000000045a58a in thread_main (param=0x23777f8) at ../src/pj/os_core_unix.c:541
#11 0x00007fbdec40ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
KantarBruceAdams commented 2 years ago

I have seen this too. The same hang inside the poll() call.

I created an abomination using a timer signal in vain attempt to work around it

There is also a (I think separate issue) that this is known to block following an attempt to bind to an exchange that doesn't exist. See https://groups.google.com/forum/#!topic/rabbitmq-c-users/JET2DGQan3g