Jdesk / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

segfault under low memory conditions #294

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Under certain low memory conditions, memcached can segfault. I cannot reproduce 
this manually, but it's occurred in production a couple times now. 

gdb stack from the core dump:
Core was generated by `/usr/local/memcached/bin/memcached -r -m 15500 -p 11211 
-c 10000 -o slab_reassi'.
Program terminated with signal 11, Segmentation fault.
#0  dispatch_conn_new (sfd=2007, init_state=conn_new_cmd, event_flags=18, 
read_buffer_size=2048, transport=tcp_transport) at thread.c:355
355 thread.c: No such file or directory.
    in thread.c
(gdb) where
#0  dispatch_conn_new (sfd=2007, init_state=conn_new_cmd, event_flags=18, 
read_buffer_size=2048, transport=tcp_transport) at thread.c:355
#1  0x0000000000408637 in drive_machine (fd=<value optimized out>, which=<value 
optimized out>, arg=0x1c86c90) at memcached.c:3787
#2  event_handler (fd=<value optimized out>, which=<value optimized out>, 
arg=0x1c86c90) at memcached.c:4067
#3  0x00007f6e3de0f194 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#4  0x000000000040b949 in main (argc=<value optimized out>, argv=<value 
optimized out>) at memcached.c:5230

memcached version: 1.4.14
started with command: /usr/local/memcached/bin/memcached -r -m 15500 -p 11211 
-c 10000 -o slab_reassign slab_automove -d
OS: Linux 2.6.38-13-virtual #52-Ubuntu SMP x86_64 GNU/Linux

Original issue reported on code.google.com by ke...@tellapart.com on 16 Oct 2012 at 9:24

GoogleCodeExporter commented 9 years ago
We don't check if cq_new returns NULL or not.. I've added a test for that and 
submitted a pull request https://github.com/memcached/memcached/pull/25

There _may_ be other situation where you'd crash when running really low on 
memory...

Original comment by trond.no...@gmail.com on 17 Oct 2012 at 10:58

GoogleCodeExporter commented 9 years ago

Original comment by trond.no...@gmail.com on 17 Oct 2012 at 11:02

GoogleCodeExporter commented 9 years ago
This seems to happen to me at least once day.  The only clue I have comes from 
dmesg (entries are from the last 3 days):

32301.972297] memcached[1180]: segfault at 0 ip 000000000040d6c3 sp 
00007f1a837909a0 error 4 in memcached[400000+18000]

[542571.200952] memcached[9459]: segfault at 0 ip 000000000040d6c3 sp 
00007faf2883aa40 error 4 in memcached[400000+18000]

[607441.384512] memcached[16886]: segfault at 0 ip 000000000040d6c3 sp 
00007f74d0e40a40 error 4 in memcached[400000+18000]

I'm running version 1.4.15

Original comment by bryan.ch...@gmail.com on 24 Jan 2013 at 4:14

GoogleCodeExporter commented 9 years ago
We need a backtrace for those crashes. There's no way to tell what happened 
from just the segfault.

Can you please get a backtrace?

Original comment by dorma...@rydia.net on 24 Jan 2013 at 6:06

GoogleCodeExporter commented 9 years ago
Dear,

How do you make a backtrace?

We have the same problem on irregular basis.
[66085.311830] memcached[3441]: segfault at 0 ip 000000000040d7f7 sp 
00007f0b2d505a40 error 4 in memcached[400000+18000]

We run Mediawiki 1.19.3, php 5.3.8, Memcached 1.4.15, libevent 2.0.21 on a 
SLESS v11 sp2 VM.

Original comment by tom.po...@jandenul.com on 20 Mar 2013 at 9:16

GoogleCodeExporter commented 9 years ago
Guess I never answered the guy... people should google before asking :)

I've merged trond's thing which at least gives some visibility into potential 
problems and fixes that one bug.

Would be grand to have some tests that cause malloc failures and see how it 
goes.

Original comment by dorma...@rydia.net on 9 Dec 2013 at 2:31