cloudamqp / amqproxy

An intelligent AMQP proxy, with connection and channel pooling/reusing
https://www.cloudamqp.com
MIT License
354 stars 54 forks source link

Concurrent connections are not limited, breaking RabbitMQ server #11

Closed planestepper closed 5 years ago

planestepper commented 5 years ago

My use-case consists of openresty responding with empty_gif and then pushing the necessary data into a queue in RabbitMQ. Using the AMQP protocol directly, RabbitMQ was able to cope with the number of messages, ingesting at a rate of about 300-400 messages per second. When I installed and configured AMQProxy, that number just skyrocketed, reaching over 600msgs/s.

At one point RabbitMQ just stopped ingesting, and the management interface in CloudAMQP became unavailable and displayed an error message (see below).

image

I do see the maxConnections setting in the example.ini file, although I don't see mentions of said upper limit in the codebase, nor documentation on how to set the maximum number of connections to hold open. When I checked the management interface, there were over 6000 connections open, with many of them with a status of blocking or blocked. The instance is a Big Bunny.

The load test is being run with Apache AB, ab -k -c200 -n100000 -r 'http://<some address>/1234567890.gif'. Openresty (nginx) was running with a single worker allowing a maximum of 400 worker connections (which I believed would limit concurrency, and therefore connections). The publisher is using basic_publish, and consumers for this particular queue are in a different server. Successfully published messages are also successfully processed.

We have other pieces of software using CloudAMQP, and AMQProxy will be very helpful in case we are able to get this prototype right.

Server log sample:

(... alarm setting and clearing repeated many many times above ...)

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

=INFO EVENT==== Fri, 18 Jan 2019 19:27:50 GMT ===
vm_memory_high_watermark clear. Memory used:839187024 allowed:842953236

=WARNING EVENT==== Fri, 18 Jan 2019 19:27:50 GMT ===
memory resource limit alarm cleared on node 'rabbit@<server name>'

=INFO EVENT==== Fri, 18 Jan 2019 19:27:51 GMT ===
vm_memory_high_watermark set. Memory used:955837240 allowed:842953236

=WARNING EVENT==== Fri, 18 Jan 2019 19:27:51 GMT ===
memory resource limit alarm set on node 'rabbit@<server name>'.

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

(... there were many of the message below ...)

=ERROR EVENT==== Fri, 18 Jan 2019 19:28:13 GMT ===
closing AMQP connection <0.16260.3> (x.x.x.x:46390 -> y.y.y.y:5672):
{handshake_timeout,frame_header}

=WARNING EVENT==== Fri, 18 Jan 2019 19:29:07 GMT ===
file descriptor limit alarm set.

********************************************************************
*** New connections will not be accepted until this alarm clears ***
********************************************************************

=WARNING EVENT==== Fri, 18 Jan 2019 19:29:18 GMT ===
file descriptor limit alarm cleared

=INFO EVENT==== Fri, 18 Jan 2019 19:30:06 GMT ===
started TCP Listener on 0.0.0.0:5672

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
started SSL Listener on 0.0.0.0:5671

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
rabbit_stomp: default user 'guest' enabled

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
started STOMP TCP Listener on 0.0.0.0:61613

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
started STOMP SSL Listener on 0.0.0.0:61614

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
rabbit_web_stomp: listening for HTTP connections on 0.0.0.0:15674

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
Management plugin started. Port: 15672

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
Statistics database started.

=INFO EVENT==== Fri, 18 Jan 2019 19:30:07 GMT ===
Server startup complete; 15 plugins started.
 * rabbitmq_shovel_management
 * rabbitmq_federation_management
 * rabbitmq_management
 * rabbitmq_web_dispatch
 * webmachine
 * mochiweb
 * rabbitmq_management_agent
 * rabbitmq_shovel
 * rabbitmq_federation
 * rabbitmq_web_stomp
 * rabbitmq_stomp
 * rabbitmq_consistent_hash_exchange
 * amqp_client
 * cowboy
 * sockjs

One can see the spikes in connections:

image

and how they seem unrelated to the number of messages in the queues, over that time period:

image

planestepper commented 5 years ago

Issue was the clients were spread apart different threads, although they came from the same process. So the proxy would treat each as a new connection, reasonably. The new batch design for the same processing takes away the multiple connections and also removes the needs and benefits of using this proxy.