BGSAVE, SAVE and BGREWRITEAOF take very very long time with objects in redis vm

GoogleCodeExporter commented 8 years ago

What version of Redis you are using, in what kind of Operating System?
redis 2.2.2, Redhat 4.1.2, Linux

What is the problem you are experiencing?
bgsave, save and bgrewriteaof take a very very long time when there are objects 
in the redis VM for 5GB db.

What steps will reproduce the problem?
- Run redis with vm-enabled and vm-max-memory 2GB (my memory size is 8GB)
- Fill the database beyond 2GB (may happen with lower memory sizes too).
- issue bgsave, save or bgrewriteaof commands on cli
- any of these commands take hours to complete compared to without any objects 
in the VM

Do you have an INFO output? Please past it here.
This is the info with objects in the VM
...
used_memory_human:5.88G
used_memory_rss:5904543744
mem_fragmentation_ratio:0.94
use_tcmalloc:0
loading:0
aof_enabled:1
changes_since_last_save:2335647
bgsave_in_progress:0
last_save_time:1306430163
bgrewriteaof_in_progress:0
total_connections_received:228518
total_commands_processed:255353446
expired_keys:0
evicted_keys:0
keyspace_hits:170246352
keyspace_misses:22948688
hash_max_zipmap_entries:64
hash_max_zipmap_value:512
pubsub_channels:0
pubsub_patterns:0
vm_enabled:1
role:master
vm_conf_max_memory:8589934592
vm_conf_page_size:256
vm_conf_pages:419430400
vm_stats_used_pages:150241
vm_stats_swapped_objects:108435
vm_stats_swappin_count:174227
vm_stats_swappout_count:282662
vm_stats_io_newjobs_len:0
vm_stats_io_processing_len:0
vm_stats_io_processed_len:0
vm_stats_io_active_threads:0
vm_stats_blocked_clients:0
...

If it is a crash, can you please paste the stack trace that you can find in
the log file or on standard output? This is really useful for us!
- no crash

Please provide any additional information below.
The has been happening on several redis instances:
These are the log records when BGSAVE was done with swapped objects in the VM
[15080] 25 May 20:52:55 * Slave ask for synchronization
[15080] 25 May 20:52:55 * Starting BGSAVE for SYNC
[15080] 25 May 20:52:55 * Background saving started by pid 26827
[26827] 26 May 07:56:29 * DB saved on disk
[15080] 26 May 07:56:30 * Background saving terminated with success
[15080] 26 May 08:07:29 * Synchronization with slave succeeded

The BGSAVE took 12 hours for 5.88GB data.

I restarted the instance using the save file and had to resync the slave. Since 
we have been doing deletes for this instance there was no need for swapped 
objects anymore. As can be seen in the above info object the used_human_memory 
is 5.88G. Now when the slave connected and requested for sync the BGSAVE took 
10 minutes including transporting 3260028817 bytes to slave cross colo.

Master Log rows
[22784] 28 May 04:17:56 * Slave ask for synchronization
[22784] 28 May 04:17:56 * Starting BGSAVE for SYNC
[22784] 28 May 04:17:56 * Background saving started by pid 22987
[22987] 28 May 04:20:03 * DB saved on disk
[22784] 28 May 04:20:03 * Background saving terminated with success
[22784] 28 May 04:27:19 * Synchronization with slave succeeded

Slave Log rows
[6095] 28 May 07:17:56 * MASTER <-> SLAVE sync started: SYNC sent
[6095] 28 May 07:20:03 * MASTER <-> SLAVE sync: receiving 3260028817 bytes from 
master
[6095] 28 May 07:27:20 * MASTER <-> SLAVE sync: Loading DB in memory
[6095] 28 May 07:29:49 * MASTER <-> SLAVE sync: Finished with success

This happened for all our shards where we had swapped_objects with vm_enabled

Please help.

Thanks.

Original issue reported on code.google.com by gauravk...@gmail.com on 28 May 2011 at 11:45

GoogleCodeExporter commented 8 years ago

Any chance I can get help with this? I filed this as a Medium priority but 
would be good to have the priority increased.

Thanks.

Original comment by gauravk...@gmail.com on 1 Jun 2011 at 11:27

GoogleCodeExporter commented 8 years ago

Using VM is known to be very slow with the operations you describe. This is 
unlikely to change. The standard recommendation has been: get a box with more 
memory.

Original comment by josiah.c...@gmail.com on 3 Jun 2011 at 7:08

GoogleCodeExporter commented 8 years ago

Josiah,

Thanks for your reply.

I would expect the performance degradation for only the objects actually in the 
VM. Only 108K out of 27Million objects are in the VM.

I think redis is an amazing product, when all your data is in memory :) Having 
said that, I would like to know if the problems are because the vm feature is 
not fully developed? If it is not developed and production ready, it should be 
in some experimental branch instead of the main branch.

We had to restart the server to get the objects out of the VM so no problem 
now. Now we are monitoring heavily for memory usage on the redis instances and 
deleting data actively.

Original comment by gauravk...@gmail.com on 3 Jun 2011 at 6:54

GoogleCodeExporter commented 8 years ago

Just to clarify: the VM was definitely ready for prime time, but mixing 
in-memory and on-disk proves to be a bad combination. There are things that 
could have been implemented more optimal, but in the end the thing that hurts 
the most from mixing in-memory and on-disk is the predictability of performance.

Cheers,
Pieter

Original comment by pcnoordh...@gmail.com on 14 Jun 2011 at 7:33

Changed state: WontFix

Lachim / redis

BGSAVE, SAVE and BGREWRITEAOF take very very long time with objects in redis vm #567