Flush time on large database inaccessable

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Put a large dataset in the database:

import redis
import random
import time

r = redis.Redis()

keyname = 'Thisisakey'
idx=65
bytes = 0
while bytes < 809715200:
    num = random.randint(10,100)
    bytes += 20 * num
    r.set("key_" + str(bytes), 'ABCDEFGHIJKLMNOP' * num)

2. view redis memory usage (relative to data in database) - should be about
10x higher.
2. send a shutdown message
3. Watch cpu usage and wait for server to stop listening and flush to disk.
(This will take a long time, unacceptable for a production environment)

What is the expected output? What do you see instead?

Short server shutdown and memory usage more closely relative to data size.

What version of the product are you using? On what operating system?

1.02 on
Linux tsavanna64 2.6.30-ARCH #1 SMP PREEMPT Fri Jul 31 07:30:28 CEST 2009
x86_64 Intel(R) Xeon(R) CPU E5440 @ 2.83GHz GenuineIntel GNU/Linux

2 quad core processors, 16GB of memory running in full-64bit mode.

Please provide any additional information below.

Removal of compression (via attached patch) forced memory usage down
DRAMATICALLY (actually allowed the testcase above to complete without
swapping) and allowed the server shutdown to be bound by IO and not CPU.
(tmpfs solved the IO issue)

Original issue reported on code.google.com by kata...@gmail.com on 16 Dec 2009 at 3:00

Attachments:

disable-compression.patch

GoogleCodeExporter commented 9 years ago

Thanks kata198, this is very interesting, I never noticed how LZF was taking 
all this CPU 
time before. Now I'm trying to run some test and will report back here asap.

p.s. your patch is a bit a strange way to disable compression ;) It's enough to 
disable ti 
from the saving code.

Original comment by anti...@gmail.com on 16 Dec 2009 at 4:12

GoogleCodeExporter commented 9 years ago

Yes - I changed it in lzf_c and lzf_d such that if no solution was reached we 
could
continue to update redis without updating the patch.

I think a config option to enable or disable compression would be a short-term 
fix

Original comment by kata...@gmail.com on 16 Dec 2009 at 6:03

GoogleCodeExporter commented 9 years ago


Here is a small bash script you can run "time" against to test the respawn 
time. The
line with the comment brings up another bug - seems a timing issue when the
compression time is taken out of the equation.

#!/bin/bash

python -c "import socket; sock = socket.socket(socket.AF_INET, 
socket.SOCK_STREAM);
sock.connect( ('127.0.0.1', 6379) ); sock.send('shutdown\n'); sock.close()"
sleep .1
LISTENING=`netstat -l | grep 6379`;
while [ -n "${LISTENING}" ];
do
    printf '.'
    sleep 1
    LISTENING=`netstat -l | grep 6379`;
done
redis-server /etc/redis.conf&
# Without this line redis will start serving data before loading it
# - r.info() displays 0 used_memory. Sleeping for 2 seconds gives redis enough 
time
# to load and display the propery used_memory
#sleep 2 
python -c "import redis; r = redis.Redis(); print str(r.info());"

Original comment by kata...@gmail.com on 16 Dec 2009 at 6:17

GoogleCodeExporter commented 9 years ago

Hello again,

the fix is on Git now (by default LZF is disabled, but you can enable it back 
via config 
file, there is a new config directive).

Will look into the second issue asap.

Cheers,
Salvatore

Original comment by anti...@gmail.com on 16 Dec 2009 at 7:46

GoogleCodeExporter commented 9 years ago

About the second issue (the delay in the INFO command memory reporting) it's 
just a 
delay in the reporting of memory usage, the DB is fully loaded on memory when 
the 
server starts accepting commands.

Original comment by anti...@gmail.com on 16 Dec 2009 at 10:36

GoogleCodeExporter commented 9 years ago

p.s. the new fix for the LZF issue is to enable it with a patch to prevent LZF 
to use all 
this CPU. All the details already posted into the Redis google group and in git 
commits.

Original comment by anti...@gmail.com on 16 Dec 2009 at 10:38

GoogleCodeExporter commented 9 years ago

Thanks! Done my testing (with disabled HTAB) and the results are exactly as I'd
expect, the used_memory is now actually less than the dataset, and the cpu is 
not
hammered upon shutdown.

800M of data is flushed to disk (tmpfs - so really memory) and loaded in 2.4 
seconds.

:: Stopping Redis server                                                        

                                                                 [BUSY] :: Waiting
for redis to stop...                                                            

                                                      [BUSY] ..:: Syncing redis db to
disk                                                                            

                                        [DONE]

:: Starting Redis server                                                        

                                                                 [BUSY] 5065

                                                                 [DONE]

real    0m2.399s
user    0m0.097s
sys     0m0.253s

Original comment by kata...@gmail.com on 17 Dec 2009 at 3:23

GoogleCodeExporter commented 9 years ago

Great, I'm closing the bug.

Thank you very much for your help, this kind of analysis is much better than 
receiving a 
patch. It's cool to know Redis has got so smart users.

Cheers,
Salvatore

Original comment by anti...@gmail.com on 17 Dec 2009 at 3:26

Changed state: Verified

benluteijn / redis

Flush time on large database inaccessable #123