Open GoogleCodeExporter opened 9 years ago
Please attach with a debugger to the Redis server and dump the stack
(bt command with gdb).
Regards,
Didier.
Original comment by didier...@gmail.com
on 29 Aug 2011 at 8:41
#0 0x00002ae0233bf490 in _int_malloc () from /lib64/libc.so.6
#1 0x00002ae0233c11b6 in malloc () from /lib64/libc.so.6
#2 0x0000000000410e97 in zmalloc ()
#3 0x0000000000416ab2 in createClient ()
#4 0x0000000000416c45 in ?? ()
#5 0x0000000000416d6f in acceptTcpHandler ()
#6 0x000000000040ab5f in aeProcessEvents ()
#7 0x000000000040ad5e in aeMain ()
#8 0x000000000040f97d in main ()
Original comment by zyfu...@gmail.com
on 29 Aug 2011 at 10:24
[deleted comment]
Thanks.
Unfortunately, the stack shows a probable infinite loop
in the memory allocator. It means the memory got
corrupted before a connection accept operation, but
not much can be said about the root cause of the
corruption.
Regards,
Didier.
Original comment by didier...@gmail.com
on 29 Aug 2011 at 12:30
When you say the memory usage increased to 6G, do you mean 6G as per output of
INFO, or 6G in RSS? If it is 6G RSS, there should be enough memory on the
machine to satisfy allocation requests, unless there are other active processes
that require a lot of memory, or as Didier says, the memory is corrupted.
Once Redis hits 100% CPU and hangs, is the stack trace consistently
reproducible?
Original comment by pcnoordh...@gmail.com
on 30 Aug 2011 at 5:06
I suggest taking multiple stack traces with gdb + bt command so that we can
understand if it is a loop in malloc (not impossible but uncommon) or if it is
something at higher level.
Thanks!
Original comment by anti...@gmail.com
on 14 Sep 2011 at 1:22
Hi there.
I too am seeing periodic 100% CPU usage in my redis-server process.
My recent gdb bt:
>>>
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0xffffffffff60043b in ?? ()
(gdb) bt
#0 0xffffffffff60043b in ?? ()
#1 0x00000000024002a0 in ?? ()
#2 0x00007fca4e518560 in ?? () from /lib/libc.so.6
#3 0x0000000000411d7c in redisLog (level=3, fmt=0x449f08 "Accepting client
connection: %s") at redis.c:203
#4 0x000000000041ae96 in acceptTcpHandler (el=<value optimized out>, fd=<value
optimized out>, privdata=<value optimized out>, mask=<value optimized out>) at
networking.c:436
#5 0x000000000040d5f3 in aeProcessEvents (eventLoop=0x7fca4d847000,
flags=<value optimized out>) at ae.c:342
#6 0x000000000040d83e in aeMain (eventLoop=0x7fca4d847000) at ae.c:387
#7 0x0000000000412f37 in main (argc=2, argv=<value optimized out>) at
redis.c:1734
>>>
For what it's worth, this is a redis master, that is configured not to flush to
disk at all (this responsibility has been shifted to the slave).
Cheers.
Original comment by findch...@gmail.com
on 18 Apr 2012 at 1:06
One more backtrace that I think I caught while CPU was at 100%:
>>>
0x00007fca4e596d03 in epoll_wait () from /lib/libc.so.6
(gdb) bt
#0 0x00007fca4e596d03 in epoll_wait () from /lib/libc.so.6
#1 0x000000000040d503 in aeApiPoll (eventLoop=0x7fca4d847000, flags=<value
optimized out>) at ae_epoll.c:70
#2 aeProcessEvents (eventLoop=0x7fca4d847000, flags=<value optimized out>) at
ae.c:330
#3 0x000000000040d83e in aeMain (eventLoop=0x7fca4d847000) at ae.c:387
#4 0x0000000000412f37 in main (argc=2, argv=<value optimized out>) at
redis.c:1734
>>>
Hope that helps.
-Chris
Original comment by findch...@gmail.com
on 18 Apr 2012 at 1:10
Here is a long sampling using the Poor Man's Debugger, while CPU is pegged at
100%:
https://gist.github.com/2415544
-Chris
Original comment by findch...@gmail.com
on 18 Apr 2012 at 6:17
Hi, everything looks ok from the trace. What is the output of INFO? Maybe you
see it saving when the slave disconnects? Redis logs near the moment the CPU
reaches 100%? Does it slow down when this happens or it serves connections just
fine?
Cheers,
Salvatore
Original comment by anti...@gmail.com
on 19 Apr 2012 at 7:43
Thanks for looking into this.
When the CPU is pegged at 100% , running `redis-cli info` hangs. No response
until the CPU usage returns to normal.
Again, this is the output from the Poor Man Debugger during this period:
>>>
root@domU-12-31-39-14-09-3D:~# /usr/local/bin/poor.sh
4
2 pthread_cond_wait@@GLIBC_2.3.2,bioProcessBackgroundJobs,start_thread,clone,??
1 ??,??,??,redisLog,acceptTcpHandler,aeProcessEvents,aeMain,main
root@domU-12-31-39-14-09-3D:~# redis-cli info
^C
root@domU-12-31-39-14-09-3D:~# /usr/local/bin/poor.sh
4
2 pthread_cond_wait@@GLIBC_2.3.2,bioProcessBackgroundJobs,start_thread,clone,??
1 ??,??,??,redisLog,acceptTcpHandler,aeProcessEvents,aeMain,main
root@domU-12-31-39-14-09-3D:~# redis-cli info
^C
>>>
Cheers,
Chris
Original comment by findch...@gmail.com
on 19 Apr 2012 at 8:20
One other thing: I have not seen a new log entry since I told redis to stop
flushing to disk with this command: `redis-cli config set save ""`. Should
there be?
The only log entries prior would look like:
>>
[18350] 13 Apr 14:26:15 * 100 changes in 30 seconds. Saving...
[18350] 13 Apr 14:26:15 * Background saving started by pid 21107
[21107] 13 Apr 14:26:37 * DB saved on disk
[18350] 13 Apr 14:26:37 * Background saving terminated with success
>>
-Chris
Original comment by findch...@gmail.com
on 19 Apr 2012 at 8:24
And here's a dump of INFO (when not pegged at 100%):
https://gist.github.com/2424421
Original comment by findch...@gmail.com
on 19 Apr 2012 at 9:46
I'm getting 100% cpu usage once or twice a day.
Every time it happened I get a 'DB saved on disk' message without a 'Background
saving started by pid ???'
* Happened with v2.2.7 and after I upgraded to v2.4.11
* It didn't used to happen. It started after I loaded up a big hash of about
540k+ items in one key.
* Recovers itself after an hour, doesn't crash.
* But redis is very slow or cannot connect while it's happening.
Below are the logs, notice that pid 9668 was never started.
[15126] 28 Apr 08:01:15 * Background saving terminated with success
[15126] 28 Apr 08:16:16 * 10000 changes in 900 seconds. Saving...
[15126] 28 Apr 08:16:16 * Background saving started by pid 28646
[28646] 28 Apr 08:16:17 * DB saved on disk
[15126] 28 Apr 08:16:17 * Background saving terminated with success
[15126] 28 Apr 09:00:20 * 10000 changes in 900 seconds. Saving...
[15126] 28 Apr 09:00:20 * Background saving started by pid 334
[334] 28 Apr 09:00:20 * DB saved on disk
[15126] 28 Apr 09:00:20 * Background saving terminated with success
[15126] 28 Apr 09:15:21 * 10000 changes in 900 seconds. Saving...
[15126] 28 Apr 09:15:21 * Background saving started by pid 3474
[3474] 28 Apr 09:15:21 * DB saved on disk
[15126] 28 Apr 09:15:21 * Background saving terminated with success
[9668] 28 Apr 10:15:22 * DB saved on disk
[15126] 28 Apr 11:15:23 * 10 changes in 3600 seconds. Saving...
[15126] 28 Apr 11:15:23 * Background saving started by pid 14775
[14775] 28 Apr 11:15:23 * DB saved on disk
[15126] 28 Apr 11:15:23 * Background saving terminated with success
[15126] 28 Apr 12:15:24 * 10 changes in 3600 seconds. Saving...
[15126] 28 Apr 12:15:24 * Background saving started by pid 21453
[21453] 28 Apr 12:15:24 * DB saved on disk
Original comment by nik...@gmail.com
on 28 Apr 2012 at 5:18
I've figured out my problem. I had too many connections and it ran out of file
descriptors.
In ae_epoll.c:aeApiPoll, it calls "epoll_wait" but when epoll_wait returns an
error it doesn't get printed or anything, just returns. So it'll get called
again in an endless loop using up 100% of the cpu.
Original comment by nik...@gmail.com
on 1 May 2012 at 1:34
Hi there.
How did you figure this out, and how did you resolve it?
Cheers.
Original comment by ch...@socialvibe.com
on 1 May 2012 at 8:50
Run
sudo netstat -nap | grep redis
It should only print a few lines with your connections, not hundreds or
thousands.
It was a problem with my program not redis, but redis should really print an
error message instead of going into an endless loop.
Original comment by nik...@gmail.com
on 1 May 2012 at 10:23
@niknah: this problem is no longer present in Redis 2.6, now available filedes
are checked at startup and max clients number enforced as needed. Cheers.
p.s. please report any further issue using the Github issue system, we no
longer use this one. Thanks.
Original comment by anti...@gmail.com
on 1 May 2012 at 10:32
@niknah Looks like when my CPU was hitting 100%, the number of connection was
high.
# sudo netstat -nap | grep redis |wc -l
1020
So:
(1) @antirez: When you say it's not a problem in 2.6, does that imply that
connections beyond max_clients are closed automatically to make another
connection available?
(2) @niknah - What exactly did you change in your code to fix this? I most
likely have the same issue, and I'll try to apply your fix.
Cheers!
Original comment by ch...@socialvibe.com
on 1 May 2012 at 11:05
The new connections will return an error, when you've reached the maximum
connections.
I am using nodejs, for every connection you have to send 'quit' when you're
finished. Like...
try {
redis.createClient();
} finally {
redis.quit();
}
But if you've stored it in an object you'd have to be careful and close it
whenever the object is removed.
Original comment by nik...@gmail.com
on 2 May 2012 at 1:15
We've hit connection problems here as well. Be aware that the common linux
configuration is to limit # of connections to 1024 (see ulimit -n). On top of
that, Redis has a built in hard limit of 10240, which we removed on our end.
When you run out of FDs, Redis keeps trying to grab them and the CPU shoots up
to 100%.
Original comment by ma...@pinterest.com
on 2 May 2012 at 1:30
Thanks so much for the help guys.
This was definitely related to the FD limit. Started redis-server with `ulimit
-n 10240`, and no more CPU @ 100%. Great stuff.
I think I will need to ensure my connections get closed properly. I am using
redis-rb with passenger.
Original comment by ch...@socialvibe.com
on 2 May 2012 at 1:57
This is the new behavior of Redis 2.6:
1) At startup it checks the current maxclients setting (by default set to
10000), then it tries to set the file descriptor limit for the process to
maxclients+32 (at max 32 more filedes are used for internal usage).
2) If it is not able to set to request as many filedes as maxclients is asking
for, then it tries to find what is the limit for the process, and set the
maxclients directive to the limit minus 32.
3) So now maxclients is always active, AND matches the actual process limit.
4) If maxclient is reached, the newly connected client receives an error and
the connection is closed.
Cheers,
Salvatore
Original comment by anti...@gmail.com
on 2 May 2012 at 9:44
Thanks for the clarification Salvatore.
So when do connections get closed? Obviously the client can explicitly close
the connection, but is there a timeout after which redis closes old
connections, and if so, where is that timeout configured?
BTW, I think we can change the status of this ticket from New->Resolved.
Cheers.
Original comment by ch...@socialvibe.com
on 2 May 2012 at 7:42
As per the configuration file:
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
Original comment by josiah.c...@gmail.com
on 2 May 2012 at 10:35
Original issue reported on code.google.com by
zyfu...@gmail.com
on 29 Aug 2011 at 2:47