Closed GoogleCodeExporter closed 9 years ago
It's been difficult to reproduce this issue. I'll be taking a more careful look
into
it tonight and hopefully resolving a patch.
Original comment by dorma...@rydia.net
on 31 Oct 2009 at 1:49
I'm going to punt on this from 1.4.3-rc1 - if we can reproduce and find the bug
before 1.4.3-final we'll include it, however.
I'm staring at this really hard and while the proposed patch should fix it (and
is
how we do things in some other libevent-based projects), the claim of where it
crashed is sounding impossible via logic errors.
event_del() relies on c->event.ev_base being *correct*. So in this case
somewhere
between event_del() and event_base_set(), the '*base' pointer is getting
removed or
corrupted.
It's also possible the segfault is really in event_del() but isn't manifesting
until
later? Maybe? I'd like to work with the user with full stack traces and a core
dump..
So again, I feel like the patch will fix the problem, but also that it would
hide
something that's potentially way more serious. I'd like to put some extra
scruitiny
on the bug just in case. Apologies for it taking so long to take a look, and
thanks
for filing a great bug report :)
Original comment by dorma...@rydia.net
on 2 Nov 2009 at 3:22
Thanks for taking a look at this. The sample code that I provided (testing how
libevent behaves in regards to event.ev_base) seems to indicate that
event.ev_base is
never altered by event_base_set().
Let me know which version of memcached you want me to build and if attaching the
subsequent core file to this thread is the preferred means of getting the file
to
you. I had to reload the test script in rapid succession several times to
cause it
to fault because it has to encounter a state where writing will block. Here is
a
quick snapshot of what happened:
<31 new auto-negotiating client connection
31: going from conn_new_cmd to conn_waiting
31: going from conn_waiting to conn_read
31: going from conn_read to conn_parse_cmd
31: Client using the ascii protocol
<31 set test 0 10 50526
31: going from conn_parse_cmd to conn_nread
> FOUND KEY test
>31 STORED
31: going from conn_nread to conn_write
31: going from conn_write to conn_new_cmd
31: going from conn_new_cmd to conn_waiting
31: going from conn_waiting to conn_read
31: going from conn_read to conn_closing
<31 connection closed.
<31 new auto-negotiating client connection
31: going from conn_new_cmd to conn_waiting
31: going from conn_waiting to conn_read
31: going from conn_read to conn_parse_cmd
31: Client using the ascii protocol
<31 get test
> FOUND KEY test
>31 sending key test
>31 END
31: going from conn_parse_cmd to conn_mwrite
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb65b9b90 (LWP 32183)]
0xb7f8f436 in event_add () from /usr/lib/libevent-1.4.so.2
(gdb) backtrace
#0 0xb7f8f436 in event_add () from /usr/lib/libevent-1.4.so.2
#1 0x5ff40000 in ?? ()
#2 0x0000b7f8 in ?? ()
#3 0x080e4b38 in ?? ()
#4 0x080e9260 in ?? ()
#5 0xb7f7843e in pthread_mutex_lock () from /lib/libpthread.so.0
#6 0xc35d5f5e in ?? ()
#7 0xffffffb8 in ?? ()
#8 0x1cc483ff in ?? ()
#9 0x5d5f5e5b in ?? ()
#10 0x85838dc3 in ?? ()
...
#265 0x02fe8344 in ?? ()
#266 0x07c7ba75 in ?? ()
#267 0x00000000 in ?? ()
(gdb) gcore
Saved corefile core.32177
gdb ./memcached-debug core.32177
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `/home/thom/memcached-1.4.2/memcached-debug -l localhost
-vvv'.
Program terminated with signal 11, Segmentation fault.
[New process 32184]
[New process 32183]
[New process 32182]
[New process 32181]
[New process 32180]
[New process 32177]
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) backtrace
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7f7a8c5 in ?? ()
#2 0x00000000 in ?? ()
(gdb)
Hope this helps.
Original comment by thomc...@gmail.com
on 2 Nov 2009 at 3:12
Build against the latest -rc. Use the memcached-debug binary so the symbols
aren't
stripped. That's the only difference... aside from that and assert's being on.
Maybe
an assert will be triggered instead, who knows...
Can you contact me off-list for a few rounds of debugging? We'll post results
back
into the ticket.
Original comment by dorma...@rydia.net
on 4 Nov 2009 at 5:51
[deleted comment]
I've hit the same problem now on 1.4.5. The patch above seemed to fix the
problem.
Server was crushing after ~ 15 minutes. We have a script which starts to fill
memcached actively.
Debug output was:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6baeb90 (LWP 29069)]
0x00365627 in event_add () from /usr/lib/libevent-1.4.so.2
#0 0x00365627 in event_add () from /usr/lib/libevent-1.4.so.2
#1 0x08049bb0 in update_event (c=0x80c31f0, new_flags=20) at memcached.c:3313
#2 0x080520c6 in drive_machine (fd=35, which=2, arg=0x80c31f0) at
memcached.c:3394
#3 event_handler (fd=35, which=2, arg=0x80c31f0) at memcached.c:3732
#4 0x00365188 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#5 0x080559f6 in worker_libevent (arg=0x8066368) at thread.c:245
#6 0x00b91832 in start_thread () from /lib/libpthread.so.0
#7 0x00b0fe0e in clone () from /lib/libc.so.6
I applied the patch and now it works for 1 hour.
Original comment by mrd...@gmail.com
on 8 Nov 2010 at 4:29
I confirm the issue and the value of the patch. I replaced a few days ago a
memcached 1.4 that was crashing every few hours with a patched 1.4.5 which has
been running since then, more than 60h.
Original comment by trudea...@gmail.com
on 6 Dec 2010 at 3:57
I wonder how other people can use memcached without that patch.
Original comment by mrd...@gmail.com
on 6 Dec 2010 at 4:01
I have exactly the same problem since months, and we set up a cron job to
restart memcached after every crash... I went from 1.4.0 to 1.4.5 and it still
crashes (daily crash). Will it be in the next release ?
I feel confident on this patch but i never patch any program... If someone has
a nice tutorial to help me, i will try on my own.
Thank you.
Original comment by rit...@gmail.com
on 9 Dec 2010 at 12:38
Hi, Rituel.
To apply the patch you need to copy it into a file. Lets say you called it
memcached-issue99-suggested-patch
Copy everything exactly from the first line (*** memcached-1.4.0/memcached.c
Thu Jul 9 13:16:24 2009) to the last (short which; /** which events were
just triggered */
).
Put that file into a directory with unpacked memcached source and run:
patch -p1 <memcached-issue99-suggested-patch
It should show you something like that:
patching file memcached.c
Hunk #1 succeeded at 331 (offset 2 lines).
Hunk #2 succeeded at 3303 (offset 274 lines).
patching file memcached.h
Hunk #1 succeeded at 335 (offset 19 lines).
After that you can do like always
./configure <YOUR OPTIONS HERE>
make
make install
Original comment by mrd...@gmail.com
on 9 Dec 2010 at 1:06
I'm attaching thomchin's patch, but against memcached 1.4.5 as a file just for
convenience.
Original comment by mrd...@gmail.com
on 9 Dec 2010 at 1:13
Attachments:
Applying this fix and running it on my OpenBSD[1] server seems to cause
deadlocks. A simple getMulti[2] using about 50 items causes the server to
stall.
Without the patch everything seems to work fine (though it's not running with
any real load).
[1] 4.8-current, memcached 1.4.5 package.
[2] running the memcached php extension. libmemcached 0.44.
Original comment by otto.br...@gmail.com
on 10 Dec 2010 at 12:11
Maybe dormando is right that the patch does not fix the main problem, it just
helps to overcome it on affected configurations.
The patch works for me, memcached works fine now under quite high load for
several months. The server is on RedHat.
Original comment by mrd...@gmail.com
on 10 Dec 2010 at 2:11
Hi,
Thank you for the help. We patch memcached with
memcached1.4.5-issue99-suggested-patch and it still crashes :-(
Last crash Dec 23 00:33:10 srv13 kernel: [3110555.180650] memcached[25152]:
segfault at e62fb0fc ip 0804a856 sp b7e9f2d0 error 7 in memcached[8048000+12000]
Original comment by newsdest...@gmail.com
on 23 Dec 2010 at 10:35
Anyone still watching this ticket?
Please try 1.4.6-rc1 (or final if it's out by the time you see this). If you
can still get it to segfault and care about getting this fixed, please respond
to the issue.
Please state the exact versions of your OS, libevent, gcc, kernel, memcached,
etc.
Original comment by dorma...@rydia.net
on 12 Jul 2011 at 11:08
Hi,
Thank you for checking in, and yes, I am still watching this ticket as it still
continues to be an issue for me. I manually apply my patch on every upgrade to
eliminate the "Segmentation fault". The patched memcached runs indefinitely
until it is manually restarted after a version upgrade (months at a time).
Here is the information you requested:
memcached: 1.4.6-rc1
OS: linux 32 bit
libevent: 2.0.12
gcc: 4.4.4-r2
kernel: 2.6.30-r8
I did a default build of memcached from the source downloaded from the
memcached repository. I was able to get a SIGSEGV from a single request using
the same test script included with the original bug filing (using a new size of
524288):
./memcached -l localhost -vvv
slab class 1: chunk size 80 perslab 13107
slab class 2: chunk size 104 perslab 10082
slab class 3: chunk size 136 perslab 7710
slab class 4: chunk size 176 perslab 5957
slab class 5: chunk size 224 perslab 4681
slab class 6: chunk size 280 perslab 3744
slab class 7: chunk size 352 perslab 2978
slab class 8: chunk size 440 perslab 2383
slab class 9: chunk size 552 perslab 1899
slab class 10: chunk size 696 perslab 1506
slab class 11: chunk size 872 perslab 1202
slab class 12: chunk size 1096 perslab 956
slab class 13: chunk size 1376 perslab 762
slab class 14: chunk size 1720 perslab 609
slab class 15: chunk size 2152 perslab 487
slab class 16: chunk size 2696 perslab 388
slab class 17: chunk size 3376 perslab 310
slab class 18: chunk size 4224 perslab 248
slab class 19: chunk size 5280 perslab 198
slab class 20: chunk size 6600 perslab 158
slab class 21: chunk size 8256 perslab 127
slab class 22: chunk size 10320 perslab 101
slab class 23: chunk size 12904 perslab 81
slab class 24: chunk size 16136 perslab 64
slab class 25: chunk size 20176 perslab 51
slab class 26: chunk size 25224 perslab 41
slab class 27: chunk size 31536 perslab 33
slab class 28: chunk size 39424 perslab 26
slab class 29: chunk size 49280 perslab 21
slab class 30: chunk size 61600 perslab 17
slab class 31: chunk size 77000 perslab 13
slab class 32: chunk size 96256 perslab 10
slab class 33: chunk size 120320 perslab 8
slab class 34: chunk size 150400 perslab 6
slab class 35: chunk size 188000 perslab 5
slab class 36: chunk size 235000 perslab 4
slab class 37: chunk size 293752 perslab 3
slab class 38: chunk size 367192 perslab 2
slab class 39: chunk size 458992 perslab 2
slab class 40: chunk size 573744 perslab 1
slab class 41: chunk size 717184 perslab 1
slab class 42: chunk size 1048576 perslab 1
<31 server listening (auto-negotiate)
<32 send buffer was 107520, now 268435456
<32 server listening (udp)
<32 server listening (udp)
<32 server listening (udp)
<32 server listening (udp)
<33 new auto-negotiating client connection
33: going from conn_new_cmd to conn_waiting
33: going from conn_waiting to conn_read
33: going from conn_read to conn_parse_cmd
33: Client using the ascii protocol
<33 set test 0 10 524288
33: going from conn_parse_cmd to conn_nread
> NOT FOUND test
>33 STORED
33: going from conn_nread to conn_write
33: going from conn_write to conn_new_cmd
33: going from conn_new_cmd to conn_waiting
33: going from conn_waiting to conn_read
<34 new auto-negotiating client connection
33: going from conn_read to conn_closing
<33 connection closed.
34: going from conn_new_cmd to conn_waiting
34: going from conn_waiting to conn_read
34: going from conn_read to conn_parse_cmd
34: Client using the ascii protocol
<34 get test
> FOUND KEY test
>34 sending key test
>34 END
34: going from conn_parse_cmd to conn_mwrite
Segmentation fault
---
After doing a clean build using the same patch file I provided you, the daemon
seems to be stable again (100 successful requests using the same test script).
Thanks for looking into this once more, and let me know if you need additional
information.
thom
Original comment by thomc...@gmail.com
on 13 Jul 2011 at 1:10
Is this still on gentoo? Which php client are you using and at what version?
Any chance you could run the memcached-debug program under gdb and kill it a
few times while collecting backtraces? Curious to see if it's always in the
same spot.
Original comment by dorma...@rydia.net
on 13 Jul 2011 at 6:52
Also, can you try the 1.6 beta and see if that fails for you?
Original comment by dorma...@rydia.net
on 13 Jul 2011 at 7:52
Yes, I am still running this on gentoo, PHP 5.3.3 with a memcache 2.2.5 client
(as reported by phpinfo()). Here is a backtrace under gdb:
gdb --args ./memcached-debug -l localhost -vvv
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb) run
Starting program: /home/thom/memcached-memcached-fe2fb1d/memcached-debug -l
localhost -vvv
[Thread debugging using libthread_db enabled]
slab class 1: chunk size 80 perslab 13107
slab class 2: chunk size 104 perslab 10082
slab class 3: chunk size 136 perslab 7710
slab class 4: chunk size 176 perslab 5957
slab class 5: chunk size 224 perslab 4681
slab class 6: chunk size 280 perslab 3744
slab class 7: chunk size 352 perslab 2978
slab class 8: chunk size 440 perslab 2383
slab class 9: chunk size 552 perslab 1899
slab class 10: chunk size 696 perslab 1506
slab class 11: chunk size 872 perslab 1202
slab class 12: chunk size 1096 perslab 956
slab class 13: chunk size 1376 perslab 762
slab class 14: chunk size 1720 perslab 609
slab class 15: chunk size 2152 perslab 487
slab class 16: chunk size 2696 perslab 388
slab class 17: chunk size 3376 perslab 310
slab class 18: chunk size 4224 perslab 248
slab class 19: chunk size 5280 perslab 198
slab class 20: chunk size 6600 perslab 158
slab class 21: chunk size 8256 perslab 127
slab class 22: chunk size 10320 perslab 101
slab class 23: chunk size 12904 perslab 81
slab class 24: chunk size 16136 perslab 64
slab class 25: chunk size 20176 perslab 51
slab class 26: chunk size 25224 perslab 41
slab class 27: chunk size 31536 perslab 33
slab class 28: chunk size 39424 perslab 26
slab class 29: chunk size 49280 perslab 21
slab class 30: chunk size 61600 perslab 17
slab class 31: chunk size 77000 perslab 13
slab class 32: chunk size 96256 perslab 10
slab class 33: chunk size 120320 perslab 8
slab class 34: chunk size 150400 perslab 6
slab class 35: chunk size 188000 perslab 5
slab class 36: chunk size 235000 perslab 4
slab class 37: chunk size 293752 perslab 3
slab class 38: chunk size 367192 perslab 2
slab class 39: chunk size 458992 perslab 2
slab class 40: chunk size 573744 perslab 1
slab class 41: chunk size 717184 perslab 1
slab class 42: chunk size 1048576 perslab 1
[New Thread 0xb7eb66c0 (LWP 6457)]
[New Thread 0xb7e74b70 (LWP 6460)]
[New Thread 0xb7673b70 (LWP 6461)]
[New Thread 0xb6e72b70 (LWP 6462)]
[New Thread 0xb6671b70 (LWP 6463)]
[New Thread 0xb5e70b70 (LWP 6464)]
<34 server listening (auto-negotiate)
<35 send buffer was 107520, now 268435456
<35 server listening (udp)
<35 server listening (udp)
<35 server listening (udp)
<35 server listening (udp)
<36 new auto-negotiating client connection
36: going from conn_new_cmd to conn_waiting
36: going from conn_waiting to conn_read
36: going from conn_read to conn_parse_cmd
36: Client using the ascii protocol
<36 set test 0 10 524288
36: going from conn_parse_cmd to conn_nread
> NOT FOUND test
>36 STORED
36: going from conn_nread to conn_write
36: going from conn_write to conn_new_cmd
36: going from conn_new_cmd to conn_waiting
36: going from conn_waiting to conn_read
<37 new auto-negotiating client connection
36: going from conn_read to conn_closing
<36 connection closed.
37: going from conn_new_cmd to conn_waiting
37: going from conn_waiting to conn_read
37: going from conn_read to conn_parse_cmd
37: Client using the ascii protocol
<37 get test
> FOUND KEY test
>37 sending key test
>37 END
37: going from conn_parse_cmd to conn_mwrite
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7673b70 (LWP 6461)]
0xb802d1a1 in event_base_set () from /usr/lib/libevent-2.0.so.5
(gdb) backtrace
#0 0xb802d1a1 in event_base_set () from /usr/lib/libevent-2.0.so.5
#1 0x095f67c0 in ?? ()
#2 0x095f67c0 in ?? ()
#3 0x00000014 in ?? ()
#4 0x095f67d0 in ?? ()
#5 0x0804a26b in update_event (c=0xb802d0f7, new_flags=134579104)
at memcached.c:3353
#6 0x08059d25 in event_handler (fd=37, which=2, arg=0x95f67c0)
at memcached.c:3438
#7 0xb8030441 in event_base_loop () from /usr/lib/libevent-2.0.so.5
---
2 more additional backtraces only:
#0 0xb7ec41a1 in event_base_set () from /usr/lib/libevent-2.0.so.5
#1 0x0812dad0 in ?? ()
#2 0x0812dad0 in ?? ()
#3 0x00000014 in ?? ()
#4 0x0812dae0 in ?? ()
#5 0x0804a26b in update_event (c=0xb7ec40f7, new_flags=134579104)
at memcached.c:3353
#6 0x08059d25 in event_handler (fd=36, which=2, arg=0x812dad0)
at memcached.c:3438
#7 0xb7ec7441 in event_base_loop () from /usr/lib/libevent-2.0.so.5
---
#0 0xb7fcd1a1 in event_base_set () from /usr/lib/libevent-2.0.so.5
#1 0x0847e0a0 in ?? ()
#2 0x0847e0a0 in ?? ()
#3 0x00000014 in ?? ()
#4 0x0847e0b0 in ?? ()
#5 0x0804a26b in update_event (c=0xb7fcd0f7, new_flags=134579104)
at memcached.c:3353
#6 0x08059d25 in event_handler (fd=36, which=2, arg=0x847e0a0)
at memcached.c:3438
#7 0xb7fd0441 in event_base_loop () from /usr/lib/libevent-2.0.so.5
---
Seems to always be in the same spot, and I believe it is because the "base"
variable does not contain a valid address.
Thanks.
Original comment by thomc...@gmail.com
on 13 Jul 2011 at 8:09
Hi,
As for testing 1.6.0-beta, I am having trouble manually generating "configure"
since it is missing in the source (as it was in the release candidate).
aclocal is reporting: "configure.ac:2: file `m4/version.m4' does not exist",
and it is indeed missing in the source that I downloaded. Is there something
else I should be doing/running to get it ready to build?
Thanks,
thom
Original comment by thomc...@gmail.com
on 13 Jul 2011 at 8:30
http://memcached.googlecode.com/files/memcached-1.6.0_beta1.tar.gz - are you
*sure* that you're using this file? The configure script is certainly not
missing from this tarball.
Original comment by dorma...@rydia.net
on 13 Jul 2011 at 8:38
I was definitely *not* using that file :) Downloaded it from the github source
link. With the provided tar link, I am getting the following warning which
translates to an error:
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I./include -I./libevent
-fvisibility=hidden -pthread -g -O2 -Wall -Werror -pedantic
-Wmissing-prototypes -Wmissing-declarations -Wredundant-decls
-fno-strict-aliasing -MT mcstat.o -MD -MP -MF .deps/mcstat.Tpo -c -o mcstat.o
`test -f 'programs/mcstat.c' || echo './'`programs/mcstat.c
cc1: warnings being treated as errors
programs/mcstat.c: In function 'print':
programs/mcstat.c:108: error: ignoring return value of 'fwrite', declared with
attribute warn_unused_result
programs/mcstat.c:110: error: ignoring return value of 'fwrite', declared with
attribute warn_unused_result
make[1]: *** [mcstat.o] Error 1
---
This was after the memcached daemon build succeeded, along with generation of
the launching shell script, so I was able to test. The daemon no longer
appears to suffer from a "Segmentation fault", and I noticed that event.ev_base
is now being manually set in memcached.c. Did you consider this as a bug in
libevent (not setting this value) and applied this temporary workaround until
they fix it?
Thanks.
Original comment by thomc...@gmail.com
on 13 Jul 2011 at 9:23
I have relative little clue on what's been changed in 1.6 :P Was mostly curious
if it'd been fixed there.
I'm still nervous about the change since I can't tell why it's failing in the
first place, and have never been able to reproduce the failure on any machine
that I have. Thinking we'll stare harder and maybe merge it anyway just after
1.4.6 goes out, so it can sit in the tree for a while longer before 1.4.7.
Original comment by dorma...@rydia.net
on 13 Jul 2011 at 9:36
Staring at this stupid bug, again.
Can you point out where in memcached.c (from 1.6) you believe it to be manually
setting the event.ev_base? The update_event() functions look essentially
identical.
The only place it's being manually set in 1.6 (that I can see) is in the TAP
code where it's adding an event to a different base.
I've replicated your -vvv output precisely, and have a script running which is
attempting to set/get/delete a value of all possible sizes on a 32bit system.
No crash.
Can you try the attached patch, and let me know which (if either) of the
assert's your test hits? One thing we're doing wrong is not checking the output
of event_base_set, but it doesn't look like that'd make a difference here.
Thanks! I don't want to give up on this, but I want to make sure it's done
right.
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 2:02
Here is another patch that I will likely stage in my for_147 tree for now. This
is similar to yours, except it directly accesses that connection's thread's
current base, for even more sanity.
After trying the assert patch, can you test this one against 1.4.6 and let me
know if it has similarily good results?
Given that you're saying 1.6 works, but the backtraces in 1.4 show it going
through some code which *wasn't* changed, I'm sorta even more terrified about
what could possibly be causing this.
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 2:06
Attachments:
I believe the attached patch will correct the problem. It addresses the issue
of using a variable that has no value ever set to it (as my C example showed
that the internal libevent event.ev_base member never gets assigned a value).
Your patch does now use the conn.thread.base variable which gets set to the
appropriated main_base through thread_init().
Your patch also mirrors a similar change in 1.6.0-beta1 in a less roundabout
way. If you look at line 5510 of daemon/memcache.c, you will notice:
c->event.ev_base = tp->base;
It assigns event.ev_base to the thread.base, but I like your change better as
you do not touch an internal variable in the libevent event structure, which
could be subject to change in later libevent releases.
Hope this helps and gives you more confidence in your patch :)
Original comment by t...@genx.net
on 8 Aug 2011 at 11:27
That doesn't really help me no... As I said above, that particular case (line
5510) is entirely new code that the test doesn't even run. That's for a
specific case when a client using the TAP protocol connects and subscribes to a
data stream.
The update_event() logic hasn't changed at all, which is where the backtrace
would always fail.
I'm curious as to what the results of the first patch (with the asserts) are
for anyone who's actually having this failure.
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 3:26
Okay, sorry about that. Because I am able to experience a "Segmentation Fault"
immediately on the official 1.4.6 release, I only did a cursory test of 1.6 (50
iterations) with no failure. Did a quick grep to see if anything changed in
relation to the event.ev_base variable and noticed it being set at line 5510,
without backtracing everything to confirm that portion of the code is even
being hit.
Upon further inspection of 1.6, I am noticing there is a local source build of
libevent that has differing behavior from the standard builds I am installing
via Gentoo. I did some primitive debugging in conn_new() to inspect
c->event.ev_base and base before and after the event_base_set() call. In 1.6,
c->event.ev_base is set to the value of base, but in 1.4.6, the value is left
untouched (remains NULL). This is the most likely reason why I am not
experiencing the problem in 1.6.
Which "assert patch" are you referring to (not seeing it in this thread)?
I did a clean build of the 1.4.6 source and was able to crash it immediately.
I then applied the memcached_thread_base_fix.patch and I have not been able to
get it to crash.
Original comment by thomc...@gmail.com
on 8 Aug 2011 at 4:37
Duhr... apparently it failed to attach yesterday.
Try this? Without the other fix.
Interesting that the builtin libevent for 1.6 fixes it as well :/ That's just
the libevent taken straight from the site (latest 1.4). Is Gentoo breaking the
damn thing?
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 5:04
Attachments:
(gentoo dev here)
I can't reproduce this on stock Gentoo, 32-bit or 64-bit. I'm wondering if your
systems have some local changes or an old install of libevent lurking on them,
maybe in /usr/local.
Can you run this and attach the output please?
# gcc libevent-test.c -o libevent-test.i -E && egrep 'event.h|/usr/local'
libevent-test.i
(libevent-test.c is your C test case from the start of the bug)
Original comment by robbat2....@gmail.com
on 8 Aug 2011 at 7:52
the "stats" command will also return the libevent version in use...
Original comment by trond.no...@gmail.com
on 8 Aug 2011 at 7:56
I'm going to close out this bug; it turns out the user had an older version of
event.h in /usr/local, which was getting pulled in before the actual installed
version.
So I was right in figuring this was impossible. Thanks for the assist, robin!
The most bizarre thing is that even with the wrong structs being used between
the library and memcached, that with a small patch it continued to function. It
should've just immediately exploded so far as I'm concerned. The event.h was
potentially four years older than the installed version of libevent.
Original comment by dorma...@rydia.net
on 8 Aug 2011 at 8:05
Original issue reported on code.google.com by
thomc...@gmail.com
on 20 Oct 2009 at 7:33