Redis getting stuck during bgsave

GoogleCodeExporter commented 8 years ago

What version of Redis you are using, in what kind of Operating System?
2.2.5 on Ubuntu 10.04 on an EC2 m1.xlarge with the dump.rdb sitting on an EBS 
volume.

What is the problem you are experiencing?
Every few days, redis will bgsave (even though we think we have that fully 
turned off).  The bgsave is usually fine (taking about 10 minutes) but in these 
cases I/O skyrockets, response times rise to 30s to minutes, and the temp file 
saves reeeeally slowly (taking about 10 hours to complete).

We have one slave redis instance replicating off this instance.

What steps will reproduce the problem?
No particular steps.  Just happens after a few days of being hit with 
production data.

Do you have an INFO output? Please past it here.
redis_version:2.2.5
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
process_id:17607
uptime_in_seconds:34841
uptime_in_days:0
lru_clock:688921
used_cpu_sys:272.15
used_cpu_user:65.97
used_cpu_sys_childrens:13.07
used_cpu_user_childrens:2.95
connected_clients:250
connected_slaves:1
client_longest_output_list:131
client_biggest_input_buf:0
blocked_clients:0
used_memory:11689173736
used_memory_human:10.89G
used_memory_rss:16531742720
mem_fragmentation_ratio:1.41
use_tcmalloc:0
loading:0
aof_enabled:0
changes_since_last_save:31898567
bgsave_in_progress:0
last_save_time:1307088730
bgrewriteaof_in_progress:0
total_connections_received:4238
total_commands_processed:32666627
expired_keys:0
evicted_keys:0
keyspace_hits:32224528
keyspace_misses:1371
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:0
role:master
allocation_stats:6=1,8=437395,9=133641615,10=6545871,11=526885,12=121771,13=6446
206,14=38930884,15=18681559,16=723724925,17=195553781,18=3356,19=31681067,20=112
85,21=26781,22=96155,23=958351,24=288187849,25=24549711,26=8,27=147417,28=3,29=1
20151,30=7470,31=15258,32=12997259,33=74590,34=5545,35=80799,36=2688,37=27622,39
=9413,40=437430,41=75479,43=11064,44=1,45=71648,47=78272,48=436998,49=20289,51=1
6654,53=100013,55=25619,56=1,57=2841,58=1,59=79318,61=29529,63=31955,64=715,65=8
8474,67=11446,69=3474,71=80463,72=208836,73=5285,75=7454,77=72606,79=22721,81=26
86,83=77310,85=8909,87=13001,88=4248,89=126831,91=8974,93=18968,95=80242,96=2041
45,97=8653,99=13655,101=75594,103=9366,105=16920,107=76540,109=29089,111=7591,11
3=71641,115=9307,117=31186,119=99503,120=191762,121=5547,123=20710,125=71831,127
=8943,128=715,129=6406,131=75511,133=4181,135=3881,137=77700,139=3126,141=4565,1
43=78468,144=184383,145=3825,147=2566,149=76018,151=2768,153=4197,155=76042,157=
2188,159=2085,161=76446,163=16326,165=4642,167=76168,168=174143,169=2434,171=787
0,173=124955,175=3176,177=3066,179=80668,181=2550,183=1898,185=88755,187=5931,18
9=5582,191=81597,192=167341,193=3999,195=2079,197=78252,199=1673,201=2694,203=76
319,205=15755,207=5147,209=77449,211=4139,213=2539,215=76286,216=158123,217=1475
,219=1614,221=100804,223=2468,225=1243,227=76194,229=27151,231=1159,233=76220,23
5=3233,237=22529,239=76267,240=151916,241=12666,243=2225,245=76359,247=3682,249=
1203,251=76766,253=1162,255=1054,>=256=59344246
db0:keys=285005,expires=0

If it is a crash, can you please paste the stack trace that you can find in
the log file or on standard output? This is really useful for us!

Please provide any additional information below.

# Redis configuration file example

daemonize yes
pidfile /var/run/redis_6379.pid
port 6379
timeout 300
loglevel notice
logfile /var/log/redis.log
databases 2
rdbcompression yes
dbfilename dump.rdb
dir /home/redis/
appendonly no
#appendfilename appendonly.aof
#appendfsync no
#no-appendfsync-on-rewrite no
hash-max-zipmap-entries 64
hash-max-zipmap-value 512
activerehashing yes

1) It seems we're hitting a bug in Redis.  Do you know if this kind of bug has 
been addressed in the latest 2 or so builds?  How about with the upcoming 2.4?
2) Is there a way we can turn BGSAVE off altogether?

Original issue reported on code.google.com by ma...@pinterest.com on 3 Jun 2011 at 5:53

GoogleCodeExporter commented 8 years ago

This is not a bug with Redis, it's because EBS can have very high read/write 
latency. The solution is to use local instance storage, manual bgsaves (not 
automatic), and copying the dump to EBS/S3 afterwards.

If you remove all "save" lines in your config file, and restart, you should be 
okay.

If you suspect that some client is calling BGSAVE, you can rename commands, 
though I'm not finding the docs for it right now.

Original comment by josiah.c...@gmail.com on 3 Jun 2011 at 7:17

GoogleCodeExporter commented 8 years ago

The reason it seems like it's a bug in Redis is that normally the bgsave works. 
 But every so often it goes into this failure mode.  We'll look into storing 
locally in any case.

Does replication for bgsave's to occur?  Could that be a source of our 
unrequested bgsaves?

Original comment by ma...@pinterest.com on 3 Jun 2011 at 8:23

GoogleCodeExporter commented 8 years ago

This is not a bug with either Redis or EBS, it's a bug in ubuntu 10.04 and 
10.10.
Switching to 11.04 or Debian should fix this. I've had this one too, and it is 
very thoroughly discussed here:
http://groups.google.com/group/redis-db/browse_thread/thread/f1be7a7ed9afcb53/db
b62349d56ab095?hl=en&lnk=gst

Original comment by dvir...@gmail.com on 3 Jun 2011 at 9:09

GoogleCodeExporter commented 8 years ago

Thank you!  We had just discovered this link as well.  Looks *exactly* why what 
we're seeing (massive square shaped spikes).  We also think we saw this with a 
mongo db instance.  We'll be swapping over to Natty tonight and will report the 
findings over the next few weeks.

Original comment by ma...@pinterest.com on 3 Jun 2011 at 9:20

GoogleCodeExporter commented 8 years ago

Closing in the meantime, can be reopened when this remains an issue.

Original comment by pcnoordh...@gmail.com on 14 Jun 2011 at 6:46

Changed state: Invalid

Lachim / redis

Redis getting stuck during bgsave #572