Open GoogleCodeExporter opened 8 years ago
Hello, are you sure you don't run commands like KEYS from time to time? Or
other similarly slow commands like big set intersections or alike?
Btw the simplest way to figure what happens is the following. When it blocks
use the script you'll find at:
http://poormansprofiler.org/
And post here the output. Run it a few times while Redis is stopped. This
script is able to tell us where Redis is spending so much time.
Thank you!
Salvatore
Original comment by anti...@gmail.com
on 5 Jan 2011 at 1:38
I confirm the issue.
Redis 2.0.4
Linux RHEL5 2.6.18 SMP, 16 cores
CPU usage ~ 100%
==> redis-benchmark:
====== PING ======
10002 requests completed in 30.17 seconds
50 parallel clients
3 bytes payload
keep alive: 1
......
331.57 requests per second (60343.38 requests per second - when we disable new
services that use Redis intensively)
==> poormansprofiler output:
3 options:
1) 1
at,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,ae
Main,main
2) 1
"results:all:qZv9GAOK"),keysCommand,call,processCommand,processInputBuffer,readQ
ueryFromClient,aeProcessEvents,aeMain,main
3) 1
dictNext,keysCommand,call,processCommand,processInputBuffer,readQueryFromClient,
aeProcessEvents,aeMain,main
also it produces the following warnings:
warning: .dynamic section for "/lib64/libm.so.6" is not at the expected address
warning: difference appears to be caused by prelink, adjusting expectations
warning: .dynamic section for "/lib64/libpthread.so.0" is not at the expected
address
warning: difference appears to be caused by prelink, adjusting expectations
warning: .dynamic section for "/lib64/libc.so.6" is not at the expected address
(wrong library or version mismatch?)
warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the
expected address
warning: difference appears to be caused by prelink, adjusting expectations
warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7fff177fc000
When Redis works fine profile output is as follows:
1 __epoll_wait_nocancel,aeApiPoll,flags=3),aeMain,main
with the only warning
warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7fffa4ded000
Original comment by eugene.t...@gmail.com
on 22 Jan 2011 at 12:15
As expected your output of the poor man profiler shows that you are using KEYS
command, that should never be used in production code by clients, but just for
debugging.
As you can see in the documentation KEYS is O(N) both in space and time, and
will trash Redis performances.
You can also check our documentation here:
http://redis.io/commands/keys
You can read there: "Warning: consider KEYS as a command that should only be
used in production environments with extreme care. It may ruin performance when
it is executed against large databases. This command is intended for debugging
and special operations, such as changing your keyspace layout. Don't use KEYS
in your regular application code. If you're looking for a way to find keys in a
subset of your keyspace, consider using sets.".
Therefore the issue is not valid.
Original comment by anti...@gmail.com
on 22 Jan 2011 at 1:39
p.s. however thank you for reporting. While in this case the problem is clear
in other cases the same macroscopic behavior may be due to something actually
wrong in the server.
Cheers,
Salvatore
Original comment by anti...@gmail.com
on 22 Jan 2011 at 1:40
Hi Salvatore,
Thanks for a prompt reply. We'll get rid of keys command and report here about
results.
Cheers
Eugene
Original comment by eugene.t...@gmail.com
on 22 Jan 2011 at 7:00
I was the original reporter, we had not used KEYS or commands like that, but
for be sure, I have checked it when I've tried debugging the problem. Anyway,
we had only about 10 keys, so I guess KEYS would be quite fast in this
situation. :)
Unfortunately we have replaced the code was based on Redis, so I can't profile
it. I'll try to get some time for trying to repeat it, but I cannot say now
that it will be soon.
Original comment by barthazi...@gmail.com
on 22 Jan 2011 at 7:13
I feel bad to speak contrary to the OP's statements, but it really smells like
this is a bgsave operation overloading the (otherwise already fairly heavily
loaded) box and causing other commands to run slowly.
How many processors in the machine? How much memory does it have? What
operations are being performed? Anything else running on that box?
Original comment by josiah.c...@gmail.com
on 23 Jan 2011 at 8:03
Barthazi, @Josiah: BGSAVE can't overload the server as it's a different
process. It can only matter that you run out of memory because of BGSAVE and
the instance starts swapping, this of course will lead to a much slower server,
but it's a different problem.
Anyway the only reliable way to debug this problems is to use a profiler to
check where the time is spent, together with the other system tools to check if
the system is out of memory, if there is OS buffers / pages I/O and so forth.
Currently *every single time* it was possible to understand what was going on
really, it was either an out of memory condition or the user doing something
strange like KEYS, big set operations like intersections, huge MULTI/EXEC calls
with hundeds of commands inside, and so forth.
I added the poor man profiler script in the utils in the unstable branch, file:
utlis/whatisdoing.sh
This is so far the best we can do to debug this kind of issues.
Cheers,
Salvatore
Original comment by anti...@gmail.com
on 23 Jan 2011 at 9:21
I didn't see that it was an 8 core server, or that Redis was using 10% of the
system memory.
If it's using EBS at Amazon (or some other network storage wherever it is),
operations may be slow during a BGSAVE due to network contention...
Original comment by josiah.c...@gmail.com
on 24 Jan 2011 at 1:54
Works fine without keys command.
Original comment by eugene.t...@gmail.com
on 28 Jan 2011 at 2:35
I would like to also ask for help in finding out why our redis is using 100%
cpu. Apologies if I'm resurrecting an old thread.
The output for the poor man's profiler with 100 samples is
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
warning: Selected architecture i386:x86-64 is not compatible with reported
target architecture i386
warning: Architecture rejected target-supplied description
101
22 __accept_nocancel,anetGenericAccept,anetTcpAccept,acceptTcpHandler,aeProcessEvents,aeMain,main
20 open64,_IO_file_fopen,??,redisLog,acceptTcpHandler,aeProcessEvents,aeMain,main
16 __write_nocancel,sendReplyToClient,aeProcessEvents,aeMain,main
12 epoll_wait,aeApiPoll,flags=<value,aeMain,main
8 ??,??
7 gettimeofday,gettimeofday,aeGetTime,aeProcessEvents,aeMain,main
3 __read_nocancel,read,optimized,aeProcessEvents,aeMain,main
3 gettimeofday,gettimeofday,aeGetTime,processTimeEvents,flags=<value,aeMain,main
2 vfprintf,__vsnprintf_chk,vsnprintf,"",,anetGenericAccept,anetTcpAccept,acceptTcpHandler,aeProcessEvents,aeMain,main
1 __vsnprintf_chk,vsnprintf,"",,anetGenericAccept,anetTcpAccept,acceptTcpHandler,aeProcessEvents,aeMain,main
1 __vsnprintf_chk,vsnprintf,"\001\200\255\373\nsg20\ae",,anetGenericAccept,anetTcpAccept,acceptTcpHandler,aeProcessEvents,aeMain,main
1 time,redisLog,acceptTcpHandler,aeProcessEvents,aeMain,main
1 ??,??,redisLog,acceptTcpHandler,aeProcessEvents,aeMain,main
1 epoll_ctl,aeApiAddEvent,fd=750,,_installWriteEvent,addReply,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,aeMain,main
1 epoll_ctl,aeApiAddEvent,fd=117,,_installWriteEvent,addReply,lookupKeyReadOrReply,getGenericCommand,call,processCommand,processInputBuffer,readQueryFromClient,aeProcessEvents,aeMain,main
1 ??,??,??
We are using redis 2.2.11.
Original comment by lpu...@insynchq.com
on 5 Jul 2011 at 11:11
I'm seeing similar behavior on redis 2.4.7. Everything's going fine, and then
suddenly redis is using 95% (jumps up from 10%) of the CPU and performance
slows to a crawl. The only commands running against the DB are GET, SET,
EXPIRE, and DELETE:
https://github.com/martinrusev/django-redis-sessions/blob/master/redis_sessions/
session.py
At the point of failure there are lots of open connections:
$ netstat -tn|grep 6379|wc -l
1184
even though I stopped sending traffic to the redis server. In addition, the
following command just hangs and doesn't return:
$ redis-cli info
Is there a way to log all commands to see when it dies?
Original comment by tob...@caktusgroup.com
on 24 Feb 2012 at 9:47
This appears to be a duplicate of #648:
http://code.google.com/p/redis/issues/detail?id=648
Original comment by findch...@gmail.com
on 18 Apr 2012 at 1:08
Original issue reported on code.google.com by
barthazi...@gmail.com
on 5 Jan 2011 at 1:31