VM continues swapping out when out of pages in VM file.

GoogleCodeExporter commented 8 years ago

2.2.4 / 2.2.5, freebsd 8

Recently we started having 100% cpu redis lockups on our FreeBSD production 
box, about 2-3 times a week.

After investigating further i found this on a dev box:

With VM enabled, redis seems to continue swapping out objects to somewhere even 
when out of swap space/pages.

in log i get:
[56167] 28 Apr 01:43:26 # WARNING: vm-max-memory limit exceeded by more than 
10% but unable to swap more objects out!
no other errors/warnings are present in log.

in config:
vm-enabled yes
vm-swap-file /home/redis/redis.swap
vm-max-memory 500m
vm-page-size 32
vm-pages 80000000
vm-max-threads 4
hash-max-zipmap-entries 64
hash-max-zipmap-value 512
activerehashing yes

in info:
vm_conf_max_memory:500000000
vm_conf_page_size:32
vm_conf_pages:80000000
vm_stats_used_pages:8793456
vm_stats_swapped_objects:5982411
vm_stats_swappin_count:4389
vm_stats_swappout_count:5987213
vm_stats_io_newjobs_len:0
vm_stats_io_processing_len:0
vm_stats_io_processed_len:0
vm_stats_io_active_threads:0
vm_stats_blocked_clients:0

vm_stats_swapped_objects and vm_stats_swappout_count keeps increasing.

Expected behavoiur seems to be that redis should use extra RAM beyond allowed 
limit when no free swap pages are availiable (production box has a lot of free 
ram).

Original issue reported on code.google.com by Glebu...@gmail.com on 27 Apr 2011 at 10:01

GoogleCodeExporter commented 8 years ago

Update:
Seems i haven't really fully filled swap in 'info' in the ticket.

After filling swap completely with random 32-bit integers and random k:int 
keys, i get 100% cpu usage and insert rates jumping from 5000 to 20-30 inserts 
per second. (i.e. for 10 seconds it is 30/sec, then for 1-2 seconds it is 
6000/sec).

so, NO, redis seems to be swapping correctly, just eating a lot of CPU and very 
inconsistent access time when swap is full.

Reproduce code:

<?php
error_reporting(E_ALL);
$redis = new Redis();
$redis->connect('/tmp/redis.sock');
$redis->select(1);
$i = 0;
while (1) {
    if (++$i % 10000 === 0) {
        echo "$i ";
    }
    $k = $redis->getset('k:'.mt_rand(), mt_rand());
}

with latest phpredis.

full config:
daemonize yes
pidfile /var/run/redis/redis.pid

bind 127.0.0.1

unixsocket /tmp/redis.sock

loglevel verbose
logfile /home/redis/redisv.log

databases 16

appendonly yes
appendfsync no

dir /home/redis/
vm-enabled yes
vm-swap-file /home/redis/redis.swap
vm-max-memory 50m
vm-page-size 32
vm-pages 800000
vm-max-threads 4

hash-max-zipmap-entries 64
hash-max-zipmap-value 512

activerehashing yes

Original comment by Glebu...@gmail.com on 27 Apr 2011 at 10:41

GoogleCodeExporter commented 8 years ago

Your VM swap file is too small, you set your max memory to be too small for 
your data set, and the result is erratic performance?

I'm not surprised. If you reduced the amount of memory in your system to 50 
megs, set a swap file to be 25 megs, then tried to run a bunch of stuff, your 
system would perform poorly, probably tell you that you can't run as much 
stuff, and if it was Linux, start killing processes.

Redis tries to do it's best with what it is given, and since you've given it 
almost nothing, it's not doing very well with it. In particular, Redis doesn't 
use more swap than you tell it to (by design), and it does it's best to not use 
more memory than the vm-max-memory setting (also by design). If it were to 
start using more memory (in an unbounded way as you are suggesting it should), 
everyone would consider that a bug.

Since you seem to be mucking around in order to try to get good performance, 
low memory use, etc., you should ask how to configure it for better 
performance. The answer to that is simple: be honest in your max memory 
settings. If you have enough memory to hold everything in ram, disable vm. If 
you want a hard limit, use 'maxmemory' and set a 'maxmemory-policy' that is the 
behavior you want. If you don't have enough ram to hold the data you need to, 
performance is going to suck, and you should either switch to diskstore now, 
hang on until diskstore is ready/better, find something else, etc.

I have two questions for you:
1. For what reason are you giving it so little resources?
2. Why would you consider poor/erratic performance so unusual when you've not 
given it enough resources to do what you've asked it to do?

Original comment by josiah.c...@gmail.com on 28 Apr 2011 at 4:52

GoogleCodeExporter commented 8 years ago

0. My initial problem was complete lockups, not performance. In fact, 
performance problem I found seems unrelated. Will apreciate any advice on what 
data to collect if another lockup occurs on production machine (unable to 
enable debug log there).

1. Simple - i am trying to reporoduce my lockup on not-so-huge debug box.
In production redis has ~500mb ram + 2.5G swap, and I expected it to use more 
ram over time (box has 32G, but redis is not the only thing there), and I would 
increase swap size/memory when needed. My AOF is currently 1.7 GB after 
rewrite, so it's not a good idea to restart redis (takes ~10min) just to 
increase swap/memory size.
The fact that i need to watch swap file usage and restart redis with increased 
swap before it uses it all, or performance drops 100-fold is not really nice 
thing anyway.

2. Performance after swap space ends seems to be OK and CONSISTENT with large 
(~6kb) values, but it is very poor and inconsistent with a lot of 32-bit 
integer values.

3. This ticket is relly NOT about performance. This is about LOCK-UPS - redis 
uses 100% cpu with NO queries (not for 10 seconds or anything - last time it 
was ~40 minutes until i killed it). About 2-3 times a week and unreproducable. 
Redis-cli is unable to connect. Absolutely nothing in log.

Original comment by Glebu...@gmail.com on 28 Apr 2011 at 7:28

GoogleCodeExporter commented 8 years ago

When Redis is stuck, rather than just killing it, you could
try to kill -ABRT so it generates a core dump. Then use
gdb to extract the backtrace, it might give a hint to
Redis developers.

Alternatively, you could use the pmp (poor man's profiler)
before killing Redis. Be sure to run it with a good number
of samples (at least 100).

http://poormansprofiler.org/

Regards,
Didier.

Original comment by didier...@gmail.com on 28 Apr 2011 at 8:33

GoogleCodeExporter commented 8 years ago


Sorry, I did not notice you were using freebsd. You do not need the
pmp, if you have the pstack command working. You can just run pstack
a good number of times on the Redis process.

Original comment by didier...@gmail.com on 28 Apr 2011 at 8:37

GoogleCodeExporter commented 8 years ago

Thank you.

Will try to reprouce the problem on another box today, and try to get a 
trace/core dump if it locks up again on production box.

Original comment by Glebu...@gmail.com on 28 Apr 2011 at 8:39

GoogleCodeExporter commented 8 years ago

Spinning disks top out at roughly 100 random IOs per second. If you are 
reading/writing 6k data chunks, you are moving 6 megs/second. If you are 
reading/writing 32 bit ints in a random way (which you are), you are moving 400 
bytes/second. If you are using Amazon EC2 with EBS, or any network storage at 
all, this could all be worse.

If you get lucky with swap file layout, access patterns etc., you may get 
occasional bursts, but I am not surprised that you are getting poor 
performance, lockups, and frustrations with reading/writing 32 bit ints.

Original comment by josiah.c...@gmail.com on 28 Apr 2011 at 6:37

GoogleCodeExporter commented 8 years ago

1. pstack does not work on amd64 freebsd, so i can't use it.
2. no luck with poor man's profiler either.
3. kill -ABRT seems to produce something like this for redis, i guess i need to 
compile a debug version?
#0  0x000000000042056f in ?? ()
#1  0x00000000004218b5 in ?? ()
#2  0x00000000004219df in ?? ()
#3  0x0000000000421c31 in ?? ()
4. didn't have any more lock ups yet.
5. Please close this ticket, i will repost if i get lucky getting a stack trace 
of a locked up redis process.

Original comment by Glebu...@gmail.com on 28 Apr 2011 at 6:52

Lachim / redis

VM continues swapping out when out of pages in VM file. #539