alanxz / rabbitmq-c

RabbitMQ C client
MIT License
1.76k stars 669 forks source link

amqp_poll() hangs even with heartbeat enabled #508

Open reddysrikesh opened 6 years ago

reddysrikesh commented 6 years ago

Our consumer thread hangs indefinitely even with heartbeat enabled. We are using version: 0.7.0-2. Below is bt. Is there a workaround for this issue?

(gdb) bt full

0 0x00002b40430d48ed in poll () from /lib64/libc.so.6

No symbol table info available.

1 0x00002b403ed3472f in amqp_poll (fd=66, event=2, deadline=...)

at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_socket.c:286
    pfd = {fd = 66, events = 1, revents = 0}
    res = -1006629872
    timeout_ms = 29999
    __PRETTY_FUNCTION__ = "amqp_poll"

2 0x00002b403ed35178 in recv_with_timeout (state=0x2b40c4000c10, timeout=...)

at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_socket.c:713
    res = -4865
    fd = 66

3 0x00002b403ed354b4 in wait_frame_inner (state=0x2b40c4000c10, decoded_frame=0x2b4076b1f450, timeout=0x0)

at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_socket.c:836
    res = 0
    deadline = {time_point_ns = 4640379980915264}
    timeout_deadline = {time_point_ns = 18446744073709551615}
    res = 0

4 0x00002b403ed35820 in amqp_simple_wait_frame_on_channel (state=0x2b40c4000c10, channel=1,

decoded_frame=0x2b4076b1f450)
at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_socket.c:942
    frame_ptr = 0x2b4076b1f470
    cur = 0x0
    res = 114

5 0x00002b403ed3a1af in amqp_read_message (state=0x2b40c4000c10, channel=1, message=0x2b4076b1f828, flags=0)

at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_consumer.c:217
    frame = {frame_type = 8 '\b', channel = 0, payload = {method = {id = 1991374224, decoded = 0x2b4076b1f480},
        properties = {class_id = 62864, body_size = 47555869275264, decoded = 0x2b40430705e5 <malloc+85>,
          raw = {len = 47555869276128, bytes = 0x2b4076b1f7e0}}, body_fragment = {len = 47555869275536,
          bytes = 0x2b4076b1f480}, protocol_header = {transport_high = 144 '\220', transport_low = 245 '\365',
          protocol_version_major = 177 '\261', protocol_version_minor = 118 'v'}}}
    ret = {reply_type = AMQP_RESPONSE_NONE, reply = {id = 0, decoded = 0x0}, library_error = 0}
    body_read = 47557166634079
    body_read_ptr = 0x32 <Address 0x32 out of bounds>
    res = 11072

6 0x00002b403ed3a053 in amqp_consume_message (state=0x2b40c4000c10, envelope=0x2b4076b1f7e0,

timeout=0x2b4076b1f5e0, flags=0)
at rabbitmq-c/projects/rabbitmq-c/rabbitmq-c-0.7.0/librabbitmq/amqp_consumer.c:186
    res = 0
    frame = {frame_type = 1 '\001', channel = 1, payload = {method = {id = 3932220, decoded = 0x2b40c4061498},
        properties = {class_id = 60, body_size = 47557166634136, decoded = 0x2b40c4000c10, raw = {len = 0,
            bytes = 0x10c4000c10}}, body_fragment = {len = 47553881833532, bytes = 0x2b40c4061498},
        protocol_header = {transport_high = 60 '<', transport_low = 0 '\000', protocol_version_major = 60 '<',
          protocol_version_minor = 0 '\000'}}}
    delivery_method = 0x2b40c4061498
    ret = {reply_type = AMQP_RESPONSE_NONE, reply = {id = 0, decoded = 0x0}, library_error = 0}

7 0x00000000005a3fb2 in RabbitMQClient::RMQConsume (this=0x2b40bc08adc0, envelope=envelope@entry=0x2b4076b1f7e0,

timeout=timeout@entry=30000000) at RabbitMQClient.cpp:572
    tmout = {tv_sec = 0, tv_usec = 30000000}
    rval = -1
    ret = <optimized out>
    __FUNCTION__ = "RMQConsume"
alanxz commented 6 years ago

Does the same issue happen with v0.9.0?

reddysrikesh commented 6 years ago

looks like minimum required cmake version is 2.8.12 to compile v0.9.0? Unfortunately our build system is still at 2.8.8. Do you want us to try v0.8.0 ?

alanxz commented 6 years ago

I'd recommend installing a newer version of CMake and testing against v0.9.0. There have been several improvements in the way sockets are handled since v0.8.0.

lecardozo commented 5 years ago

I'm facing this same issue when calling the amqp_read_message() after amqp_basic_get(). I've noticed that it hangs forever when there is no message available in the queue, but works just fine when there are messages to be pulled. Initially, I thought it was (maybe) the expected behavior for amqp_read_message (blocking until new messages are available) but even after publishing new messages to this queue, the function never returns. I tested with the code from the master branch and the same thing happened. What could be happening?

Here is the traceback:

(gdb) backtrace
#0  0x00007ffff72c3730 in __poll_nocancel ()
    at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fffebc03713 in ?? ()
   from /usr/lib/x86_64-linux-gnu/librabbitmq.so.4
#2  0x00007fffebc0382f in ?? ()
   from /usr/lib/x86_64-linux-gnu/librabbitmq.so.4
#3  0x00007fffebc03edd in ?? ()
   from /usr/lib/x86_64-linux-gnu/librabbitmq.so.4
#4  0x00007fffebc04252 in ?? ()
   from /usr/lib/x86_64-linux-gnu/librabbitmq.so.4
#5  0x00007fffebc07003 in amqp_read_message ()

I also used strace to see if I could find any hints on what is happening and found it hangs on the poll syscall, right after an EAGAIN (Resource temporarily unavailable) exception to recvfrom

sendto(3, "\1\0\1\0\0\0\r\0<\0F\0\0\5teste\1\316", 21, MSG_NOSIGNAL, NULL, 0) = 21
recvfrom(3, 0x56a4eb0, 131072, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\1\0\1\0\0\0\5\0<\0H\0\316", 131072, 0, NULL, NULL) = 13
write(1, "1", 1)                        = 1
write(1, "\n", 1)                       = 1
recvfrom(3, 0x56a4eb0, 131072, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])

Thanks!

SaumilShah66 commented 2 years ago

I am also facing the same issue. Can someone please help me ?