alanxz / rabbitmq-c

RabbitMQ C client
MIT License
1.77k stars 672 forks source link

amqp_consume_message() returns AMQP_STATUS_TIMEOUT even when the target machine unreachable(with Network Cable pulled out) #391

Open HemanthBeecherla opened 8 years ago

HemanthBeecherla commented 8 years ago

We are using rabbitmq-c to for event/message processing software. using the call amqp_consume_message(). I am getting the AMQP_STATUS_TIMEOUT even after the target IP becomes unreachable. Could you please let me know if you need more information. host machine: $ sudo cat /proc/version Linux version 3.10.0-327.3.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Wed Dec 9 14:09:15 UTC 2015

Rabbitmq-c rpm version:

$ sudo rpm -qa | grep rabbit librabbitmq-0.7.1-1.fc23.x86_64 librabbitmq-devel-0.7.1-1.fc23.x86_64

Plese let me know if you need more information.

alanxz commented 8 years ago

Could you try running this against what's currently in master?

alanxz commented 8 years ago

It'd also be useful to know whether you're using heartbeats with these connections.

HemanthBeecherla commented 8 years ago

Hi Alan, I am not using heartbeats as of now. I working on testing the bits from master. at the same time I am wondering if there is any work around with 0.7.1 release, Could you please let me know if any call to find out the target IP not reachable or link removed.

Thanks, Hemanth

alanxz commented 8 years ago

Use heartbeats.

Even if there was a function to do that sort of thing, its quite possible that the connection goes away between the test for its existence and the next call to try and use the connection.

HemanthBeecherla commented 7 years ago

Hi Alan, How is this heartbeat option works with amqp_consume_message(), I could not see any argument or flag to turn on the heartbeats.

alanxz commented 7 years ago

Its a parameter that is passed to amqp_login.

HemanthBeecherla commented 7 years ago

Hi Alan,

I have tested with hearbeat enabled and found one serious issue,

  1. It is blocking in amqp_consume_message() function indefinitely when target Ethernet connection removed.
  2. But when I reinserted the cable back it came out of the amqp_consume_message() from blocking state and returned an error AMQP SOCKET ERROR.

But this is not what I have expected when I enable heartbeat. Could you please let me know if there is any solution to this. And thank you so much for helping me this far, and your continued help is highly appreciated.

Thanks, Hemanth

alanxz commented 7 years ago

What are you using for the heartbeat timeout, and how long are you waiting in amqp_consume_message? amqp_consume_message should return an amqp_rpc_status_t with reply_type = AMQP_RESPONSE_LIBRARY_EXCEPTION and library_error = AMQP_STATUS_HEARTBEAT_TIMEOUT.

AMQP_SOCKET_ERROR is an expected error if you disconnect then reconnect the physical connection.

GauravMBhandarkar commented 7 years ago

@alanxz

0 0x00007ffff6f7ab7d in poll () from /lib64/libc.so.6

1 0x00007fffc11a07ef in recv_with_timeout () from /opt/VRTSofcore/lib/libmq.so

2 0x00007fffc11a08f6 in wait_frame_inner () from /opt/VRTSofcore/lib/libmq.so

3 0x00007fffc11a3c18 in amqp_consume_message () from /opt/VRTSofcore/lib/libmq.so

......... Rabbitmq-server's is running on node A Network service is restarted on node A The code calling amqp_consume_message running on node B with infinite timeout just hangs forever. I'm not using Heartbeats.
HemanthBeecherla commented 7 years ago

Hi Alan, Below is detailed Explanation of my situation,

]# rpm -qa | grep rabbit
librabbitmq-devel-0.8.0-2.fc26.x86_64
librabbitmq-0.8.0-2.fc26.x86_64

]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)

# Heartbeat timeout is set to 2 sec in amqp_login() as 5th parameter.
amqp_login(conn, "/", 0, 131072, 2,
                                AMQP_SASL_METHOD_EXTERNAL,
                                AMQP_EXTERNAL_USER,
                                AMQP_EXTERNAL_PASSWORD)

# amqp_consume_message() timeout is set to 5 sec
                timeout.tv_sec = AMQP_CONSUME_TIMEOUT_SEC; // 5
                timeout.tv_usec = AMQP_CONSUME_TIMEOUT_USEC; // 0 
                res = amqp_consume_message(conn, &envelope, &timeout, 0);

# After removing the cables:

amqp_consume_message is blocked as below without giving AMQP_STATUS_HEARTBEAT_TIMEOUT
BT:
(gdb) info thread
  Id   Target Id         Frame
  5    Thread 0x7fffef645700 (LWP 19016) "ov_rest_event_t" 0x00007ffff5ae0b7d in poll () from /lib64/libc.so.6
  4    Thread 0x7fffefe46700 (LWP 19015) "EventPop" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  3    Thread 0x7ffff0647700 (LWP 19014) "EventGet" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  2    Thread 0x7ffff0e48700 (LWP 19013) "DiscoveryThread" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fe4840 (LWP 19009) "openhpid" 0x00007ffff5ae28f3 in select () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fffef645700 (LWP 19016))]
#0  0x00007ffff5ae0b7d in poll () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff5ae0b7d in poll () from /lib64/libc.so.6
#1  0x00007ffff36e640b in amqp_poll (fd=7, event=event@entry=2, deadline=...) at librabbitmq/amqp_socket.c:286
#2  0x00007ffff36e9e75 in amqp_ssl_socket_close (base=0x7fffe40e4390) at librabbitmq/amqp_openssl.c:365
#3  0x00007ffff36e6c3c in wait_frame_inner (state=0x7fffe4064780, decoded_frame=0x7fffef644a20, timeout=<optimized out>)
    at librabbitmq/amqp_socket.c:840
#4  0x00007ffff36e6f75 in amqp_simple_wait_frame_noblock (state=state@entry=0x7fffe4064780,
    decoded_frame=decoded_frame@entry=0x7fffef644a20, timeout=<optimized out>) at librabbitmq/amqp_socket.c:978
#5  0x00007ffff36e1642 in amqp_consume_message (state=state@entry=0x7fffe4064780, envelope=envelope@entry=0x7fffef644ba0,
    timeout=timeout@entry=0x7fffef644af0, flags=flags@entry=0) at librabbitmq/amqp_consumer.c:154
#6  0x00007ffff3d727ee in ov_rest_scmb_listner (handler=handler@entry=0x651250) at ov_rest_event.c:1274
#7  0x00007ffff3d72cb2 in ov_rest_event_thread (ov_pointer=0x651250) at ov_rest_event.c:1559
#8  0x00007ffff65594e5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#9  0x00007ffff62d5dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff5aeb1cd in clone () from /lib64/libc.so.6
(gdb) c

I did wait 10 min for this call to return some error but it did not return with an error.
# After reinserting the cables:
I did receive the below error AMQP_STATUS_HEARTBEAT_TIMEOUT. Since this error is coming after the cable is inserted back, it is not helping us in anyway.
ov_rest: CRIT: ov_rest_event.c:1416: AMQP_STATUS_HEARTBEAT_TIMEOUT

Also observed that the amqp_consume_message() is still blocked as below,
(gdb) info thread
  Id   Target Id         Frame
  5    Thread 0x7fffef645700 (LWP 19016) "ov_rest_event_t" 0x00007ffff7ffa7c2 in clock_gettime ()
  4    Thread 0x7fffefe46700 (LWP 19015) "EventPop" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  3    Thread 0x7ffff0647700 (LWP 19014) "EventGet" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  2    Thread 0x7ffff0e48700 (LWP 19013) "DiscoveryThread" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fe4840 (LWP 19009) "openhpid" 0x00007ffff5ae28f3 in select () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fffef645700 (LWP 19016))]
#0  0x00007ffff7ffa7c2 in clock_gettime ()
(gdb) bt
#0  0x00007ffff7ffa7c2 in clock_gettime ()
#1  0x00007ffff5afee1d in clock_gettime () from /lib64/libc.so.6
#2  0x00007ffff36e9291 in amqp_get_monotonic_timestamp () at librabbitmq/amqp_time.c:100
#3  0x00007ffff36e9302 in amqp_time_from_now (time=time@entry=0x7fffef644970, timeout=<optimized out>)
    at librabbitmq/amqp_time.c:131
#4  0x00007ffff36e6b02 in wait_frame_inner (state=0x7fffe4064780, decoded_frame=0x7fffef644a20, timeout=<optimized out>)
    at librabbitmq/amqp_socket.c:790
#5  0x00007ffff36e6f75 in amqp_simple_wait_frame_noblock (state=state@entry=0x7fffe4064780,
    decoded_frame=decoded_frame@entry=0x7fffef644a20, timeout=<optimized out>) at librabbitmq/amqp_socket.c:978
#6  0x00007ffff36e1642 in amqp_consume_message (state=state@entry=0x7fffe4064780, envelope=envelope@entry=0x7fffef644ba0,
    timeout=timeout@entry=0x7fffef644af0, flags=flags@entry=0) at librabbitmq/amqp_consumer.c:154
#7  0x00007ffff3d727ee in ov_rest_scmb_listner (handler=handler@entry=0x651250) at ov_rest_event.c:1274
#8  0x00007ffff3d72cb2 in ov_rest_event_thread (ov_pointer=0x651250) at ov_rest_event.c:1559
#9  0x00007ffff65594e5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#10 0x00007ffff62d5dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff5aeb1cd in clone () from /lib64/libc.so.6
(gdb) c

I have checked the status of the thread again after 10 min and it still in blocked state at same place and did not move any further as below,

(gdb) info thread
  Id   Target Id         Frame
  5    Thread 0x7fffef645700 (LWP 19016) "ov_rest_event_t" 0x00007ffff7ffa7c2 in clock_gettime ()
  4    Thread 0x7fffefe46700 (LWP 19015) "EventPop" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  3    Thread 0x7ffff0647700 (LWP 19014) "EventGet" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
  2    Thread 0x7ffff0e48700 (LWP 19013) "DiscoveryThread" 0x00007ffff5ae5949 in syscall () from /lib64/libc.so.6
* 1    Thread 0x7ffff7fe4840 (LWP 19009) "openhpid" 0x00007ffff5ae28f3 in select () from /lib64/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fffef645700 (LWP 19016))]
#0  0x00007ffff7ffa7c2 in clock_gettime ()
(gdb) bt
#0  0x00007ffff7ffa7c2 in clock_gettime ()
#1  0x00007ffff5afee1d in clock_gettime () from /lib64/libc.so.6
#2  0x00007ffff36e9291 in amqp_get_monotonic_timestamp () at librabbitmq/amqp_time.c:100
#3  0x00007ffff36e9302 in amqp_time_from_now (time=time@entry=0x7fffef644970, timeout=<optimized out>)
    at librabbitmq/amqp_time.c:131
#4  0x00007ffff36e6b02 in wait_frame_inner (state=0x7fffe4064780, decoded_frame=0x7fffef644a20, timeout=<optimized out>)
    at librabbitmq/amqp_socket.c:790
#5  0x00007ffff36e6f75 in amqp_simple_wait_frame_noblock (state=state@entry=0x7fffe4064780,
    decoded_frame=decoded_frame@entry=0x7fffef644a20, timeout=<optimized out>) at librabbitmq/amqp_socket.c:978
#6  0x00007ffff36e1642 in amqp_consume_message (state=state@entry=0x7fffe4064780, envelope=envelope@entry=0x7fffef644ba0,
    timeout=timeout@entry=0x7fffef644af0, flags=flags@entry=0) at librabbitmq/amqp_consumer.c:154
#7  0x00007ffff3d727ee in ov_rest_scmb_listner (handler=handler@entry=0x651250) at ov_rest_event.c:1274
#8  0x00007ffff3d72cb2 in ov_rest_event_thread (ov_pointer=0x651250) at ov_rest_event.c:1559
#9  0x00007ffff65594e5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#10 0x00007ffff62d5dc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff5aeb1cd in clone () from /lib64/libc.so.6
(gdb)

Thanks, Hemanth

alanxz commented 7 years ago

The first stack-trace doesn't match with v0.8.0 of rabbitmq-c. (amqp_ssl_socket_close doesn't call amqp_poll, this was changed in 26a53d29bd32809e07687ef6515d48ef7d07f0b8, which shipped in v0.8.0).

Could you try building rabbitmq-c from the master branch here and see if that fixes your issue?

HemanthBeecherla commented 7 years ago

Hi Alan,

Sorry my mistake I was pointing to incorrect library. v0.8.0 works great for me ;).

does v0.8.0 really depends on openssl1.1.0, because when I try install below rpms librabbitmq-0.8.0-2.fc26.x86_64.rpm librabbitmq-devel-0.8.0-2.fc26.x86_64.rpm these rpms needs/depend on

# rpm -iUvh librabbitmq-0.8.0-2.fc26.x86_64.rpm librabbitmq-devel-0.8.0-2.fc26.x86_64.rpm
warning: librabbitmq-0.8.0-2.fc26.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 64dab85d: NOKEY
error: Failed dependencies:
        libcrypto.so.1.1()(64bit) is needed by librabbitmq-0.8.0-2.fc26.x86_64
        libcrypto.so.1.1(OPENSSL_1_1_0)(64bit) is needed by librabbitmq-0.8.0-2.fc26.x86_64
        libssl.so.1.1()(64bit) is needed by librabbitmq-0.8.0-2.fc26.x86_64
        libssl.so.1.1(OPENSSL_1_1_0)(64bit) is needed by librabbitmq-0.8.0-2.fc26.x86_64

But When I try to compile and install it did not ask for openssl.1.1.0.

So I am wondering is there a need for Openssl 1.1.0 to work with rabbitmq-c v0.8.0 in any special case.

Thank you so much for all your help.

Thanks, Hemanth

alanxz commented 7 years ago

rabbitmq-c does not need OpenSSL v1.1.0. Likely the source has been patched to work with OpenSSL v1.1.0 in the RPM you're trying to install.