ebin123456 / py-amqplib

Automatically exported from code.google.com/p/py-amqplib
GNU Lesser General Public License v2.1
0 stars 0 forks source link

Occasionally missing work #50

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Amqplib 1.0.2 with timeout patch. Python 3.2. RabbitMQ 2.7.1.

I have two workers on the same server processing incoming messages and replying 
RPC-style (i.e. preserving correlation_id). Also, 
chan.basic_qos(prefetch_size=0,prefetch_count=1,a_global=False) and 
chan.basic_consume(queue=q, no_ack=False, callback=data_received)

Each work item takes 200-2000ms depending on content. There is plenty of time 
between work items, they arrive 1-20 per second. The two workers are mainly for 
redundancy (being able to upgrade and restart without downtime).

I have logging, with timestamps, in my own code both on submitting work to 
RabbitMQ and on processing it.

Occasionally, I will see work packets being delayed maybe 25 seconds from being 
given to RabbitMQ before arriving at the client. This happens randomly and 
about once every 5K-10K work items. After 20 seconds the client times out so 
while the work is eventually done, no one is interested in the result.

RabbitMQ and the workers are separated by one network switch, the servers are 
just next to eachother. The Rabbit queue setup is very simple with just a 
default/direct exchange.

I am unsure where the problem lies, but my own logging indicates it is either 
RabbitMQ or amqplib. I would be happy to log more, share code, share data or 
otherwise do work to follow up on people's hunches.

If anyone is interested, the site in question is boardword.com

Original issue reported on code.google.com by fgunder...@gmail.com on 5 Mar 2012 at 10:09

GoogleCodeExporter commented 9 years ago
Edit: work items appearing 1-20 times per minute, not second. The point is that 
the workers are mostly idle.

Original comment by fgunder...@gmail.com on 5 Mar 2012 at 10:11

GoogleCodeExporter commented 9 years ago
I think it would be helpful to have more detailed timings/tracings at different 
points inside and outside the programs to find out where the delay is.   For 
example, a "tcpdump" time for when the network packet arrives at the client 
hardware, and a time from inside Python for when your code got the message from 
amqplib - to find out if the delay is actually in the network stack/client 
library (and not in the broker).

Original comment by barry.pe...@gmail.com on 10 Mar 2012 at 7:09

GoogleCodeExporter commented 9 years ago
I have been using the timeout patch with a 2 sec timeout. The 2 sec timeout is 
used mostly in dev for cleanly exiting the app by catching ctrl+c (see comment 
in the timeout patch thread). However having a timeout seems to be the main 
culprit in my problem. By increasing the timeout to 20 secs the problem almost 
goes away. It seems there is some kind of problem when having frequent 
reconnects (or whatever happens when the timeout expires).

Original comment by fgunder...@gmail.com on 24 Mar 2012 at 9:18