Closed paradis closed 9 years ago
I saw this as a possibility when reviewing the implementation but was never able to fully reproduce it. Mostly as my understanding of the heartbeat mechanic is somewhat limited.
Thanks for the info, hopefully I can find the case where it's not working as expected.
re "Some message are regularly exchanged and the server doesn't send heartbeat": You should always have a heartbeat frame sent by the server in the pre-negotiated interval.
Can you set logging to debug mode:
import logging
logging.basicConfig(level=logging.DEBUG)
and then connect to RabbitMQ with your problematic app and post the log lines from DEBUG:rabbitpy.channel0:Received frame: 'Connection.Start
to DEBUG:rabbitpy.channel0:Connection opened
?
Here's an example from my laptop:
DEBUG:rabbitpy.channel0:Received frame: 'Connection.Start'
DEBUG:rabbitpy.channel0:Server information: 'Licensed under the MPL. See http://www.rabbitmq.com/'
DEBUG:rabbitpy.channel0:Server product: 'RabbitMQ'
DEBUG:rabbitpy.channel0:Server copyright: 'Copyright (C) 2007-2014 GoPivotal, Inc.'
DEBUG:rabbitpy.channel0:Server supports exchange_exchange_bindings: True
DEBUG:rabbitpy.channel0:Server supports connection.blocked: True
DEBUG:rabbitpy.channel0:Server supports authentication_failure_close: True
DEBUG:rabbitpy.channel0:Server supports basic.nack: True
DEBUG:rabbitpy.channel0:Server supports per_consumer_qos: True
DEBUG:rabbitpy.channel0:Server supports consumer_priorities: True
DEBUG:rabbitpy.channel0:Server supports consumer_cancel_notify: True
DEBUG:rabbitpy.channel0:Server supports publisher_confirms: True
DEBUG:rabbitpy.channel0:Server cluster_name: 'rabbit@gmr-mbp'
DEBUG:rabbitpy.channel0:Server platform: 'Erlang/OTP'
DEBUG:rabbitpy.channel0:Server version: '3.4.1'
DEBUG:rabbitpy.base:Writing frame: Connection.StartOk
DEBUG:rabbitpy.channel0:Received frame: 'Connection.Tune'
DEBUG:rabbitpy.channel0:Started a heartbeat timer that will fire in 1160 sec
DEBUG:rabbitpy.base:Writing frame: Connection.TuneOk
DEBUG:rabbitpy.base:Writing frame: Connection.Open
DEBUG:rabbitpy.channel0:Received frame: 'Connection.OpenOk'
DEBUG:rabbitpy.channel0:Connection opened
Also if you have an example that demonstrates the behavior that you can share, that would be helpful.
Here is a script and the logs to reproduce the first problem. https://gist.github.com/paradis/615e3d5607aef2f93597
# rabbitmqctl status
Status of node '<>' ...
[{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.1.5"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.1.5"},
{webmachine,"webmachine","1.10.3-rmq3.1.5-gite9359c7"},
{mochiweb,"MochiMedia Web Server","2.7.0-rmq3.1.5-git680dba8"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.1.5"},
{rabbit,"RabbitMQ","3.1.5"},
{os_mon,"CPO CXC 138 46","2.2.7"},
{inets,"INETS CXC 138 49","5.7.1"},
{xmerl,"XML parser","1.2.10"},
{mnesia,"MNESIA CXC 138 12","4.5"},
{amqp_client,"RabbitMQ AMQP Client","3.1.5"},
{sasl,"SASL CXC 138 11","2.1.10"},
{stdlib,"ERTS CXC 138 10","1.17.5"},
{kernel,"ERTS CXC 138 10","2.14.5"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang R14B04 (erts-5.8.5) [source] [64-bit] [rq:1] [async-threads:30] [kernel-poll:true]\n"},
Good news? I was able to reliably replicate this in production over the weekend.
I'm impacted time wise early this week and need to work on replicating it with debug logging turned on to figure out exactly what's going on. I'll try and address it by the end of the week.
I've just pushed a change that should address this, updating the client side heartbeat checker to get reset on every frame received from the server. This should prevent this from happening in the future.
Hi, Thank you for all your work on rabbitpy. I think I have found a bug but I don't understand it fully.
First, it seems that a heartbeat is send by the server (v3.1.5) only if there is no exchanged message, so the current implementation will fail when a scenario like this occurs:
Channel0._last_heartbeat
)Moreover, in some case (but I don't manage to reproduce it everytime), when I launch a basic consumer, the ConnectionResetException is raised in the io-thread and not in the consumming thread leading the latter to hang forever (the corresponding io thread crashed).
Here is a stacktrace for the io thread in this case:
Here is the stacktrace for the running consuming thread:
I hope this will be enough for you to understand the problem, but if I can be of any help, I will be glad to help.