Closed GoogleCodeExporter closed 8 years ago
A friend pointed me to an interesting resource:
http://www.gnu.org/software/hello/manual/libc/Heap-Consistency-
Checking.html
"When MALLOC_CHECK_ is set, a special (less efficient) implementation is used
which is designed to be tolerant against
simple errors, such as double calls of free with the same argument, or overruns
of a single byte (off-by-one bugs). Not all
such errors can be protected against, however, and memory leaks can result. If
MALLOC_CHECK_ is set to 0, any detected
heap corruption is silently ignored; if set to 1, a diagnostic is printed on
stderr; if set to 2, abort is called immediately. This
can be useful because otherwise a crash may happen much later, and the true
cause for the problem is then very hard to
track down."
When you run redis with:
MALLOC_CHECK_=1 ./redis-server
Saving runs fine (as far as I can tell), and stdout is filled with:
. DB 0: 1652504 keys (0 volatile) in 2097152 slots HT.
. 21 clients connected (0 slaves), 569685179 bytes in use
- 10000 changes in 60 seconds. Saving...
- Background saving started by pid 13860
*** glibc detected *** ./redis-server: free(): invalid pointer: 0x3120c618 ***
*** glibc detected *** ./redis-server: free(): invalid pointer: 0x30c9f290 ***
. DB 0: 1652997 keys (0 volatile) in 2097152 slots HT.
. 21 clients connected (0 slaves), 569780904 bytes in use
*** glibc detected *** ./redis-server: free(): invalid pointer: 0x3120c6a0 ***
*** glibc detected *** ./redis-server: free(): invalid pointer: 0x3120be10 ***
...
- DB saved on disk
- Background saving terminated with success
So it looks like it might be a temporary fix.
Good luck
Michal
Original comment by mich...@me.com
on 28 Apr 2009 at 9:22
Hello!
this seems a nasty bug! What can help a lot is the following:
please install 'valgrind' on Ubuntu and run:
% valgrind ./redis-server
then try to issue a SAVE against the dataset that is able to show the bug and
send me
the log of valgrind. This will help a lot. Thanks!
Also, what the dataset contains? Lists, Sets, Strings? Are this strings very
long?
Thank you very much.
Regards,
Salvatore
Original comment by anti...@gmail.com
on 28 Apr 2009 at 9:50
I think I found the bug... investigating
Original comment by anti...@gmail.com
on 28 Apr 2009 at 10:14
Hello. I just uploaded Redis 0.093 on google code. Could you please try if this
fixes
your issue? I'm not sure it's the same problem but there was a problem with the
LZF
compression that now appears to be fixed (actually LZF appears to have an
off-by-one
bug, but now Redis is allocating a buffer a bit larger than needed).
If this will not fix the problem please send me the valgrind output so that I
can
understand where this bad memory accesses are generated.
Original comment by anti...@gmail.com
on 28 Apr 2009 at 11:26
Amazing, looks like it works fine now. Many thanks!
Original comment by mich...@me.com
on 28 Apr 2009 at 11:59
OK, so my memory usage reached 1GB, and here is what I get on the console now:
. DB 0: 3209727 keys (0 volatile) in 4194304 slots HT.
. 12 clients connected (0 slaves), 1054210519 bytes in use
* Background saving started by pid -1
. Error writing to client: Broken pipe
. Accepted 127.0.0.1:34828
* Background saving error
. Error writing to client: Broken pipe
. Accepted 127.0.0.1:34831
- 1 changes in 3600 seconds. Saving...
- Background saving started by pid -1
. Error writing to client: Broken pipe
. Accepted 127.0.0.1:34835
* Background saving error
. Error writing to client: Broken pipe
. Accepted 127.0.0.1:34838
- 1 changes in 3600 seconds. Saving...
- Background saving started by pid -1
Synchronous SAVE works however and dump.drb is created.
Another, perhaps connected thing is that clients disconnect when performing
e.g. SORT and fetching elements in
bunces, e.g. by doing
SORT links:waiting DESC BY link:*:pop limit 1000 100
it works fine from redis-cli, but if I query it several times in a row, by 2nd
or 3rd time I get
/var/lib/gems/1.8/gems/ezmobius-redis-rb-0.0.3/lib/redis.rb:430:in
`get_response': #<Errno::EAGAIN: Resource
temporarily unavailable> (RedisError)
from /var/lib/gems/1.8/gems/ezmobius-redis-rb-0.0.3/lib/redis.rb:372:in `sort'
The machine has 1.7 GB RAM (it is a EC2 small instance, 32bit, Ubuntu 9.04)
Hope this helps. We really like the idea of redis so far!
Michal
Original comment by mich...@me.com
on 29 Apr 2009 at 7:27
Just a tiny update: running the same dataset and code on a larger machine (7gb
ram, 64bit) solves the
problem. For now ;-)
Could the client-side errors I mention be a result of several clients trying to
write&read the same lists and
sets on the server?
Does EAGAIN error mean that the client should retry the operation?
Does background save start an extra process that consumes circa about the same
amount of RAM as the main
one?
Original comment by mich...@me.com
on 29 Apr 2009 at 8:19
Hello, the BGSAVE issue was "solved", basically the box is out of memory and
fork()
fails but there was no check about this in the code strangely enough. All the
other
problems appear to be all related to the box being low on resources, this is why
switching to a big EC2 instance will make it working again. Btw note that EC2
is a
bit suboptimal for Redis, from different tests it was noticed that commodity
"real"
hardware works much better. For example an inexpensive Linux box performs 100000
operations/second, an EC2 small instance 10000, an EC2 large instance 50000.
About memory usage when bgsaving, basically it uses fork() that implements a
copy-on-write semantic of memory pages. This means that the memory really
consumed is
proportional to the number of changes the dataset will get while a bgsave is in
progress. If a little number of keys are changing bgsave will consume little
additional memory, in the other hand if all the keys will change while a large
bgsave
is running the memory used will be the same amount of RAM as the main process.
Just pushed on Git a Redis version that will catch the fork() error and write
the
error message on the log.
Thanks for all your stress testing! This is a valuable thing for Redis.
Original comment by anti...@gmail.com
on 29 Apr 2009 at 8:40
Thanks for the great explanation. I am aware that EC2 and Xen add terrible
overhead, and I would much more
like to have dedicated cheap boxes with enouch RAM to handle Redis. I have
blogged about it too: http://michalfrackowiak.com/blog:redis-performance . But
this is a topic for a different story I guess.
We will continue testing Redis. It forces us to change the usual db paradigm,
but it is really fast and nice. I hope
it will be improved later - we already would have a few feature wishes, but I
need to think about them deeper.
Thanks - Michal
Original comment by mich...@me.com
on 29 Apr 2009 at 9:07
Thank you for your help Michal, I'm closing this issue. Please feel free to
request
features you need, we are not strict at all about features that are useful for
mounting general design patterns.
Original comment by anti...@gmail.com
on 29 Apr 2009 at 3:36
Original issue reported on code.google.com by
mich...@me.com
on 28 Apr 2009 at 8:48