Slow clients using pubsub cause memory to grow unbounded

GoogleCodeExporter commented 8 years ago

What version of Redis you are using, in what kind of Operating System?

2.0.4 on Linux (Fedora 14)

What is the problem you are experiencing?

The server process memory usage grows without bounds.  (May cause unintentional 
denial of service.)

What steps will reproduce the problem?

A client that is SUBSCRIBEd to a high volume will cause the server to buffer 
all pending PUBLISHed data without limit, when the client is too slow to keep 
up, or if the client has hung (blocked).  This can cause a single faulty client 
to kill a redis server.  (This is also possible when a blackholing iptable type 
filter is raised between server and client, or when an intermediate stateful 
firewall decides to be buggy.)

Do you have an INFO output? Please past it here.

redis> info
redis_version:2.0.4
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:31762
uptime_in_seconds:24418
uptime_in_days:0
connected_clients:4
connected_slaves:0
blocked_clients:0
used_memory:162028112
used_memory_human:154.52M
changes_since_last_save:74156
bgsave_in_progress:0
last_save_time:1303061753
bgrewriteaof_in_progress:0
total_connections_received:57
total_commands_processed:132388962
expired_keys:0
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:1
vm_enabled:0
role:master
db0:keys=292,expires=0

If it is a crash, can you please paste the stack trace that you can find in the 
log file or on standard output? This is really useful for us!

Please provide any additional information below.

Could you add a configuration feature to redis.conf for setting the maximum 
buffer space to use for client subscriptions?  This would be analogous to the 
Spread toolkit MaxSessionMessages (defaults to 1000, but I generally raise it 
to 5,000 or 10,000), or the  activemq.maximumPendingMessageLimit parameter of 
ActiveMQ for dealing with too-slow consumers.  The best strategy for the server 
is to drop the connection to a slow client after it has backlogged this 
configured maximum buffer size.  Then the memory impact on the server is 
limited to a fixed size per client connection.

This should be a relatively small change to the pubsubPublishMessage function 
in src/pubsub.c  Before calling addReply() and addBulk(), the client connection 
should be checked to see if too much data is buffered up.  If so, then drop the 
connection.

More generically, the fix could even go into src/networking.c into the 
_addReplyObjectToList() function, done only in the least-likely branch where 
there is already a list of objects, and we can't append to the last item in the 
list.  So perhaps in the unstable branch, line 127 would be a good spot to 
check the list size (in terms of bytes) before adding another object to the 
tail of the list.  That would have zero CPU cost for clients that read fast 
enough that the static buffer never spills over into the linked list, and would 
be a generic place to implement the bounds-checking.

I'm not very familiar with the code for redis, though.  So, you might have a 
better way to solve this!

Thanks!

Original issue reported on code.google.com by jwillp@gmail.com on 17 Apr 2011 at 6:12

GoogleCodeExporter commented 8 years ago

I think this would be a very useful feature. I'd be willing to work on it, if 
it's considered an acceptable addition.

Original comment by dave.pet...@gmail.com on 8 Sep 2011 at 7:09

GoogleCodeExporter commented 8 years ago

I'd definitely appreciate it!

Original comment by jwillp@gmail.com on 9 Sep 2011 at 12:07

GoogleCodeExporter commented 8 years ago

I've added a new feature that lets you specify a limit on the number of pending 
objects in the client's "reply" linked list (pending outbound messages).  It 
adds a new server level config parameter "maxclientqueue", with a default of 0 
(disabled).

When this setting is enabled and a client is too slow to read replies fast 
enough to keep up with the server, then the server will drop the client's 
connection when it has built up the maximum number of messages on the server 
side.  It's not perfect, because ideally you would want to limit the client 
based on the number of bytes that have been queued up.  But this setting is 
intended to be a safety-measure (like a shearing-pin).  Actually counting up 
the enqueued bytes and maintaining that value seems like overkill when we 
already have an object count in the linked list structure.

When a client exceeds this limit, the server also logs a warning message to 
indicate that it has dropped a client due to this overflow.  That provides a 
positive signal if this setting is used and is set too low, or if clients are 
frequently too slow to keep up.  (There is also a low frequency check (every 
~30 sec or so) for any clients that have overflowed, but haven't yet been 
dropped due to another write, as a cleanup pass.)

My github branch for this patch is here: 
https://github.com/willp/redis/tree/willp_issue525_memory

And the diff is attached as a patch as well.

Original comment by jwillp@gmail.com on 21 Dec 2011 at 4:16

Attachments:

issue525_memory_v1.patch

GoogleCodeExporter commented 8 years ago

Thanks for doing this!

I think your solution is very reasonable, and probably the simplest
out of the alternatives.

Original comment by dave.pet...@gmail.com on 21 Dec 2011 at 4:23

Lachim / redis

Slow clients using pubsub cause memory to grow unbounded #525