Closed comzyh closed 8 years ago
After comparing to pika, I think this is a bug of aioamqp.
Here is the reproduce: https://gist.github.com/comzyh/0262f159f764a748a163f9f13b26578b
Run
python aioamqp_0.7_heartbeat_bug_reproduce.py sender
Then run in another terminal
python aioamqp_0.7_heartbeat_bug_reproduce.py receiver
you will see the error log from receiver about 30 seconds later
If you run my test code like
python aioamqp_0.7_heartbeat_bug_reproduce.py pika_receiver
you may see
[2016-05-08 03:41:25,413] pika.heartbeat:DEBUG: Received heartbeat frame
[2016-05-08 03:41:26,505] pika.heartbeat:DEBUG: Received 155 heartbeat frames, sent 170
[2016-05-08 03:41:26,505] pika.heartbeat:DEBUG: Sending heartbeat frame
[2016-05-08 03:41:35,321] pika.heartbeat:DEBUG: Received heartbeat frame
[2016-05-08 03:41:36,507] pika.heartbeat:DEBUG: Received 156 heartbeat frames, sent 171
[2016-05-08 03:41:36,508] pika.heartbeat:DEBUG: Sending heartbeat frame
[2016-05-08 03:41:45,230] pika.heartbeat:DEBUG: Received heartbeat frame
[2016-05-08 03:41:46,512] pika.heartbeat:DEBUG: Received 157 heartbeat frames, sent 172
[2016-05-08 03:41:46,513] pika.heartbeat:DEBUG: Sending heartbeat frame
Please pay attention on the times above, like 03:41:25,413
.
At least for pika or rabbitmq-server, the heart beat from server and heart beat from client are irrelevant/independent.
But after I read the sourcecode in /aioamqp/protocol.py
I think you are to send heart beat to server when and only when the client receive a heartbeat from server (if user doesn't call the heartbeat method manually)
According to https://www.rabbitmq.com/heartbeats.html, I think maybe you misunderstand the protocol of heartbeat.
I was wondering that client should send heartbeat periodical, just like pika did:
https://github.com/pika/pika/blob/502aa0e6fdb57274aa1583138081eefc6b6e8f62/pika/heartbeat.py#L103 https://github.com/pika/pika/blob/502aa0e6fdb57274aa1583138081eefc6b6e8f62/pika/heartbeat.py#L159
How do you think?
If someone is suffering from this issue, here is the code may help you get stable connection.
For aioamqp 7.0 only
# -*- coding: utf-8 -*-
import aioamqp
import asyncio
connection = None
protocol = None
__aioamqp_heartbeat_patch_timer = None
async def disconnected(exception):
global connection, protocol
global __aioamqp_heartbeat_patch_timer
connection = None
protocol = None
__aioamqp_heartbeat_patch_timer.cancel()
__aioamqp_heartbeat_patch_timer = None
print(exception)
async def __aioamqp_heartbeat_patch():
global protocol
while True:
print('sending heartbeat to rabbitmq server.')
await protocol.heartbeat()
await asyncio.sleep(protocol.server_heartbeat)
async def get_channel():
global connection, protocol
global __aioamqp_heartbeat_patch_timer
if not connection or not protocol:
try:
connection, protocol = await aioamqp.connect(
host='yourhost',
on_error=disconnected,
)
__aioamqp_heartbeat_patch_timer = asyncio.ensure_future(__aioamqp_heartbeat_patch())
except aioamqp.AmqpClosedConnection as e:
await disconnected(e)
raise
channel = await protocol.channel()
return channel
Yes, the heartbeat handling is busted, I noticed this a while ago but didn't get around to fixing it. This is pretty high on the todo list. I hope to get the time to tackle that in the coming weeks, though I can't make any promises.
Without wanting to sound cliché, any patch would be most welcome.
Cheers
Thanks for your reply, and I have already made a patch #97
Let me explain my suspicion as follow:
PS: Have you run my reproduce code? and can you reproduce that bug in your environment?
I tested the patch #97 and it works here. I also add a small enhancement to log a warning if the server does not reply to our own hearbeats
Hi folks, I'm working on making tests pass with this branch but the feature seems to work with manual testing.
I took some time to decipher what the spec says about heartbeating and I ended up trashing the entire heartbeat code. Now it's completely transparent, the hearbeat()
coroutine is still there for compatibility purposes, but I'm still thinking about trashing it completely.
Please give it a try and let me know if you find any issues with it.
Thanks again for your patience.
@RemiCardona I have test your patch using my reproduce: https://gist.github.com/comzyh/0262f159f764a748a163f9f13b26578b It works !
Sweet! Thanks for testing!
I'll rework a few things and push this PR to master.
I'm using aioamqp 0.7 in python 3.5 envioronment.
About 180 seconds after I called basic_consume
there are error message in my terminal
At the same time I got error message in rabbitMQ log like:
It seems that after 3 missing heartbeat, the rabbitMQ server close the connection.
And I notice that if nothing is published in the queue I consume, the connection will not be closed.
But in the rest part of my project, I use the same way to create channel, and consume the message, nothing seems to be wrong.
My question is:
please forgive my poor English.