Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

Segmentation fault with "thread" version of perl 5.8.3 #7218

Closed p5pRT closed 20 years ago

p5pRT commented 20 years ago

Migrated from rt.perl.org#28369 (status was 'resolved')

Searchable as RT28369$

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Created by nog@mpa-garching.mpg.de

My name is Norbert Gruener. I am owner and maintainer of the Perl XS module AFS.

Recently I got two reports that my module is crashing under "Debian unstable" while it is running under "Debian stable". I could isolate the problem to the following situation. When I am using the "threaded" version

  "perl\, v5.8.3 built for i686-linux-thread-multi"

then my module crashes. Whereas if I am using the version without "threading"

  "perl\, v5.8.3 built for i686-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying "threads are not recommended\, so forget it"?

If you are interested in this case I can supply you with details of this problem.

Cheers\,

Norbert

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl v5.8.3: Configured by nog at Thu Apr 8 12:06:32 CEST 2004. Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration: Platform: osname=linux, osvers=2.4.25, archname=i686-linux-thread-multi uname='linux ncf-15 2.4.25 #1 smp thu feb 19 13:14:53 cet 2004 i686 unknown ' config_args='' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O3', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -I/usr/local/include' ccversion='', gccversion='3.3.1', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lnsl -lgdbm -ldl -lm -lcrypt -lutil -lpthread -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib' Locally applied patches: @INC for perl v5.8.3: /tmp/local/lib/perl5/5.8.3/i686-linux-thread-multi /tmp/local/lib/perl5/5.8.3 /tmp/local/lib/perl5/site_perl/5.8.3/i686-linux-thread-multi /tmp/local/lib/perl5/site_perl/5.8.3 /tmp/local/lib/perl5/site_perl . Environment for perl v5.8.3: HOME=/afs/mpa/home/nog LANG (unset) LANGUAGE (unset) LC_CTYPE= LD_LIBRARY_PATH=/opt/gnome/lib LOGDIR (unset) PATH=/afs/mpa/home/nog/bin:/afs/mpa/home/nog/pmtools:/opt/gnome/bin:/usr/common/sbin:/usr/local/sbin:/usr/afsws/etc:/usr/local/bin:/usr/common/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/afsws/bin: PERL_BADLANG (unset) SHELL=/usr/local/bin/tcsh ```
p5pRT commented 20 years ago

From @lizmat

At 11​:18 +0000 4/8/04\, Norbert Gruener (via RT) wrote​:

# New Ticket Created by Norbert Gruener # Please include the string​: [perl #28369] # in the subject line of all future correspondence about this issue. # \<URL​: http​://rt.perl.org​:80/rt3/Ticket/Display.html?id=28369 > ----------------------------------------------------------------- [Please enter your report here]

My name is Norbert Gruener. I am owner and maintainer of the Perl XS module AFS.

Recently I got two reports that my module is crashing under "Debian unstable" while it is running under "Debian stable". I could isolate the problem to the following situation. When I am using the "threaded" version

"perl\, v5\.8\.3 built for i686\-linux\-thread\-multi"

then my module crashes. Whereas if I am using the version without "threading"

"perl\, v5\.8\.3 built for i686\-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying "threads are not recommended\, so forget it"?

If you are interested in this case I can supply you with details of this problem.

Either that or build in some code so that the module refuses to operate in a threaded environment?

This is definitely _not_ enough to go on.

Does this only happen on Debian. Are other distributions unaffected or simply not tested?

Liz

p5pRT commented 20 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 20 years ago

From @iabyn

On Thu\, Apr 08\, 2004 at 11​:18​:49AM -0000\, Norbert Gruener wrote​:

My name is Norbert Gruener. I am owner and maintainer of the Perl XS module AFS.

Recently I got two reports that my module is crashing under "Debian unstable" while it is running under "Debian stable". I could isolate the problem to the following situation. When I am using the "threaded" version

"perl\, v5\.8\.3 built for i686\-linux\-thread\-multi" 

then my module crashes. Whereas if I am using the version without "threading"

"perl\, v5\.8\.3 built for i686\-linux"

everything is working and no segmentation fault shows up.

I don't know if this is of any interest for you or if you are saying "threads are not recommended\, so forget it"?

If you are interested in this case I can supply you with details of this problem.

Questions​:

Is the fault reproducable? Is it reproducable without having an AFS filesystem around? Is the code that is faulting actually multi-threaded code\, or is it just single-threaded code that happens to crash on a thread-enabled interpeter?

Dave.

-- Never do today what you can put off till tomorrow.

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Hi Liz\,

On Sun\, Apr 11 2004\, Elizabeth Mattijsen via RT wrote​:

At 11​:18 +0000 4/8/04\, Norbert Gruener (via RT) wrote​:

I don't know if this is of any interest for you or if you are saying "threads are not recommended\, so forget it"?

If you are interested in this case I can supply you with details of this problem.

Either that or build in some code so that the module refuses to operate in a threaded environment?

This is definitely _not_ enough to go on.

Yes\, I am aware of that :-)

Does this only happen on Debian. Are other distributions unaffected or simply not tested?

In the meantime I could reproduce that problem at my office. We are running a distribution independent Linux installation. So it has nothing to do with the Debian distribution. It is just connected to the "threaded" version of the plain\, standard version of Perl 5.8.3 running on Linux taking all "configure" defaults except the "threading".

There are more details in the answer to Dave.

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Hi Dave\,

On Sun\, Apr 11 2004\, Dave Mitchell via RT wrote​:

On Thu\, Apr 08\, 2004 at 11​:18​:49AM -0000\, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of this problem.

Questions​:

Is the fault reproducable?

Yes\, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test case.

Is the code that is faulting actually multi-threaded code\, or is it just single-threaded code that happens to crash on a thread-enabled interpeter?

To be honestly\, I have no experience with "threading" that I could judge that.

So\, let me give you some more details where the AFS API crashes. The place where everything is happening\, is very deep in one of the OpenAFS libraries. It is the function "savecontext". I have attached a modified version of this function containing several test prints.

And these are the test outputs with "unthreaded perl" with "threaded perl"  
  LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
  LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-1   LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-2   LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-3   LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-4   LWP2-SaveContext-DEBUG-5-5 LWP2-SaveContext-DEBUG-5-5   LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Cheers\,

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

static jmp_buf jmp_tmp; static char (*EP)(); static int rc; static jmp_buf_type *jmpBuffer;

afs_int32 savecontext(ep\, savearea\, sp) char (*ep)(); struct lwp_context *savearea; char* sp; {   int code;

  printf("LWP2-SaveContext-DEBUG-1 \n");   PRE_Block = 1;   EP = ep;

  printf("LWP2-SaveContext-DEBUG-2 \n");   code = setjmp(savearea->setjmp_buffer);   jmpBuffer = (jmp_buf_type *)savearea->setjmp_buffer;   savearea->topstack = (char*)jmpBuffer[LWP_SP];   printf("LWP2-SaveContext-DEBUG-3 \n");

  printf("LWP2-SaveContext-DEBUG-4 \n");   switch ( code )   {   case 0​: if ( !sp )   (*EP)();   else   {   printf("LWP2-SaveContext-DEBUG-5 \n");   rc = setjmp(jmp_tmp);   printf("LWP2-SaveContext-DEBUG-5-1 \n");   switch ( rc )   {   case 0​:   printf("LWP2-SaveContext-DEBUG-5-2 \n");   jmpBuffer = (jmp_buf_type *)jmp_tmp;   printf("LWP2-SaveContext-DEBUG-5-3 \n");   jmpBuffer[LWP_SP] = (jmp_buf_type)sp;   printf("LWP2-SaveContext-DEBUG-5-4 \n");   printf("LWP2-SaveContext-DEBUG-5-5 \n");   longjmp(jmp_tmp\,1);   break;   case 1​: (*EP)();   assert(0); /* never returns */   break;   default​:   perror("Error in setjmp1\n");   exit(2);   }   }   break;   case 2​: /* restoring frame */   printf("LWP2-SaveContext-DEBUG-6 \n");   break;  
  default​:   perror("Error in setjmp2 : restoring\n");   exit(3);   }   printf("LWP2-SaveContext-DEBUG-7 \n");   return 0; }

p5pRT commented 20 years ago

From @lizmat

At 07​:56 +0200 4/13/04\, Norbert Gruener wrote​:

On Sun\, Apr 11 2004\, Dave Mitchell via RT wrote​:

On Thu\, Apr 08\, 2004 at 11​:18​:49AM -0000\, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of this problem.

Questions​:

Is the fault reproducable?

Yes\, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test case.

Is the code that is faulting actually multi-threaded code\, or is it just single-threaded code that happens to crash on a thread-enabled interpeter?

To be honestly\, I have no experience with "threading" that I could judge that.

So\, let me give you some more details where the AFS API crashes. The place where everything is happening\, is very deep in one of the OpenAFS libraries. It is the function "savecontext". I have attached a modified version of this function containing several test prints.

And these are the test outputs with "unthreaded perl" with "threaded perl"

LWP2\-SaveContext\-DEBUG\-5                LWP2\-SaveContext\-DEBUG\-5  
LWP2\-SaveContext\-DEBUG\-5\-1              LWP2\-SaveContext\-DEBUG\-5\-1
LWP2\-SaveContext\-DEBUG\-5\-2              LWP2\-SaveContext\-DEBUG\-5\-2
LWP2\-SaveContext\-DEBUG\-5\-3              LWP2\-SaveContext\-DEBUG\-5\-3
LWP2\-SaveContext\-DEBUG\-5\-4              LWP2\-SaveContext\-DEBUG\-5\-4
LWP2\-SaveContext\-DEBUG\-5\-5              LWP2\-SaveContext\-DEBUG\-5\-5
LWP2\-SaveContext\-DEBUG\-5\-1              Segmentation fault        

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Maybe that will provide some clues.

Liz

p5pRT commented 20 years ago

From @lizmat

At 15​:42 +0200 4/13/04\, Norbert Gruener wrote​:

On Tue\, Apr 13 2004\, Elizabeth Mattijsen wrote​:

At 07​:56 +0200 4/13/04\, Norbert Gruener wrote​:

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Not yet.

Maybe that will provide some clues.

OK\, here it comes ...

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes. Now there are some lines of "messages". I don't know if you can interpret them.

Well\, maybe​:

==32696== ==32696== Invalid write of size 4 ==32696== at 0x41C38654​: savecontext (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== Address 0x41A3602C is on thread 1's stack ==32696== ==32696== Invalid read of size 4 ==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so) ==32696== Address 0x41A3602C is on thread 1's stack

I understand from your problem description that you only used a threaded Perl\, but not actually start any threads\, right? This message implies to me that there are multiple threads running (don't thread numbers start at 0?).

Liz

p5pRT commented 20 years ago

From nick@ing-simmons.net

Norbert Gruener \nog@&#8203;MPA\-Garching\.MPG\.DE writes​:

Hi Dave\,

On Sun\, Apr 11 2004\, Dave Mitchell via RT wrote​:

On Thu\, Apr 08\, 2004 at 11​:18​:49AM -0000\, Norbert Gruener wrote​:

If you are interested in this case I can supply you with details of this problem.

Questions​:

Is the fault reproducable?

Yes\, it is.

Is it reproducable without having an AFS filesystem around?

I am not sure. At least I was not able to come up with a tiny test case.

Is the code that is faulting actually multi-threaded code\, or is it just single-threaded code that happens to crash on a thread-enabled interpeter?

To be honestly\, I have no experience with "threading" that I could judge that.

So\, let me give you some more details where the AFS API crashes. The place where everything is happening\, is very deep in one of the OpenAFS libraries.

It may well be that those libraries are "thread aware" and if linked with a threading library will spawn threads

It is the function "savecontext". I have attached a modified version of this function containing several test prints.

And these are the test outputs with "unthreaded perl" with "threaded perl"

LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-5 LWP2-SaveContext-DEBUG-5-5 LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Perl uses longjmp too so that should not itself be a problem.

Enabling threads in perl does two things (mainly) as far as XS code is concerned​:

  1. Changes #define-s so that perl variables are accessed via   my_perl pointer rather than as globals and dTHX and friends   stop being no-ops. This tends to mean you XS code should   be written in style that used PERL_NO_GET_CONTEXT and dTHX/pTHX   as other wise it gets slow.   Snags with this show should show up at compile time.

  2. Links with special "threading" versions of system libraries.   This is most likely cause of the problem you are seeing.   On linux this basically means libpthread.so gets used\,   and so now you are using pthread's version of longjmp().

Cheers\,

Norbert

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

On Tue\, Apr 13 2004\, Elizabeth Mattijsen wrote​:

At 07​:56 +0200 4/13/04\, Norbert Gruener wrote​:

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Have you tried running this under valgrind? ( http​://valgrind.kde.org )

Not yet.

Maybe that will provide some clues.

OK\, here it comes ...

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes. Now there are some lines of "messages". I don't know if you can interpret them.

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

~/AFS.short/src>/tmp/local/bin/valgrind /tmp/local/bin/perl ../examples/constructor ==32696== Memcheck\, a.k.a. Valgrind\, a memory error detector for x86-linux. ==32696== Copyright (C) 2002-2003\, and GNU GPL'd\, by Julian Seward. ==32696== Using valgrind-2.0.0\, a program supervision framework for x86-linux. ==32696== Copyright (C) 2000-2003\, and GNU GPL'd\, by Julian Seward. ==32696== Estimated CPU clock rate is 1595 MHz ==32696== For more details\, rerun with​: -v ==32696== ==32696== Conditional jump or move depends on uninitialised value(s) ==32696== at 0x4000882A​: _dl_relocate_object (in /lib/ld-2.3.2.so) ==32696== by 0x40380950​: (within /lib/libc-2.3.2.so) ==32696== by 0x4000AEE5​: _dl_catch_error (in /lib/ld-2.3.2.so) ==32696== by 0x40380BBB​: _dl_open (in /lib/libc-2.3.2.so) ==32696== ==32696== Conditional jump or move depends on uninitialised value(s) ==32696== at 0x40008875​: _dl_relocate_object (in /lib/ld-2.3.2.so) ==32696== by 0x40380950​: (within /lib/libc-2.3.2.so) ==32696== by 0x4000AEE5​: _dl_catch_error (in /lib/ld-2.3.2.so) ==32696== by 0x40380BBB​: _dl_open (in /lib/libc-2.3.2.so) DEBUG-11​: 1 DEBUG-12 RX-DEBUG-1 RX-DEBUG-2 RX-DEBUG-3 RX-DEBUG-4 RX-DEBUG-5 RX-DEBUG-6 RX-DEBUG-7 RX-LWP-Init-Thread-DEBUG-1 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 RX-LWP-Init-Thread-DEBUG-2 IOMGR-Init-DEBUG-1 IOMGR-Init-DEBUG-2 IOMGR-Init-DEBUG-3 IOMGR-Init-DEBUG-4 IOMGR-Init-DEBUG-5 IOMGR-Init-DEBUG-8 LWP1-Create-Proc-DEBUG-1 LWP1-Create-Proc-DEBUG-2 LWP1-Create-Proc-DEBUG-3 LWP1-Create-Proc-DEBUG-4 LWP1-Create-Proc-DEBUG-5 LWP1-Create-Proc-DEBUG-6 LWP1-Create-Proc-DEBUG-7 LWP1-Create-Proc-DEBUG-9 LWP1-Create-Proc-DEBUG-10 LWP1-Create-Proc-DEBUG-11 ==32696== ==32696== Invalid write of size 1 ==32696== at 0x41C38480​: Initialize_Stack (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== by 0x41C37661​: LWP_CreateProcess (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== by 0x41C39121​: IOMGR_Initialize (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== by 0x41C36B4D​: rxi_InitializeThreadSupport (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== Address 0x41A3602C is 0 bytes after a block of size 196608 alloc'd ==32696== at 0x4002AA2D​: malloc (vg_replace_malloc.c​:153) ==32696== by 0x41C375F3​: LWP_CreateProcess (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== by 0x41C39121​: IOMGR_Initialize (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== by 0x41C36B4D​: rxi_InitializeThreadSupport (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) LWP1-Create-Proc-DEBUG-12 LWP1-Create-Proc-DEBUG-13 LWP1-Create-Proc-DEBUG-18 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-5 ==32696== ==32696== Invalid write of size 4 ==32696== at 0x41C38654​: savecontext (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== Address 0x41A3602C is on thread 1's stack ==32696== ==32696== Invalid read of size 4 ==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so) ==32696== Address 0x41A3602C is on thread 1's stack LWP2-SaveContext-DEBUG-5-1 LWP1-Create-Proc2-DEBUG-1 LWP1-Create-Proc2-DEBUG-2 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 LWP1-Create-Proc-DEBUG-19 ==32696== ==32696== ERROR SUMMARY​: 10 errors from 5 contexts (suppressed​: 0 from 0) ==32696== malloc/free​: in use at exit​: 983273 bytes in 13312 blocks. ==32696== malloc/free​: 25891 allocs\, 12579 frees\, 13728687 bytes allocated. ==32696== For a detailed leak analysis\, rerun with​: --leak-check=yes ==32696== For counts of detected errors\, rerun with​: -v

p5pRT commented 20 years ago

From nick@ing-simmons.net

Norbert Gruener \nog@&#8203;MPA\-Garching\.MPG\.DE writes​:

2. Links with special "threading" versions of system libraries. This is most likely cause of the problem you are seeing. On linux this basically means libpthread.so gets used\, and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is "libpthread.so" buggy\, or one of the "OpenAFS" system libraries\, or is it just an incompatible interconnection of "threaded Perl" and "OpenAFS" which is surely doing some threadening?

That is the issue - is OpenAFS doing threading? Can you ask it not to? Does OpenAFS "callback" into perl? If perl code gets invoked by another thread then thread-local stuff will not be pointing at right place.

At the moment I don't have any idea which direction I should go.

Prove it works okay with non-threaded perl?

Norbert

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

On Wed\, Apr 14 2004\, Nick Ing-Simmons wrote​:

Norbert Gruener \nog@&#8203;MPA\-Garching\.MPG\.DE writes​:

2. Links with special "threading" versions of system libraries. This is most likely cause of the problem you are seeing. On linux this basically means libpthread.so gets used\, and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is "libpthread.so" buggy\, or one of the "OpenAFS" system libraries\, or is it just an incompatible interconnection of "threaded Perl" and "OpenAFS" which is surely doing some threadening?

  Ooops ^^^^^^^^^^^ :-)   this is true but it's not what   I meant :-)  

That is the issue - is OpenAFS doing threading?

I checked the sparse documentation. Yes\, OpenAFS is doing threading but ... OpenAFS uses its own thread library called "Light Weight Process package" (LWP). It is definitely not using "pthreads".

Can you ask it not to?

I haven't found any pointer how to do that.

Does OpenAFS "callback" into perl?

I don't think so in this case. The XS code makes an OpenAFS "initialization" call. From there it steps down several OpenAFS system library calls and then it crashes in the savecontext function at the "longjmp" statement.

If perl code gets invoked by another thread then thread-local stuff will not be pointing at right place.

Do you think it is possible that the coincidence of "pthread" in Perl and the "LWP thread" in OpenAFS is causing that segmentation ?

At the moment I don't have any idea which direction I should go.

Prove it works okay with non-threaded perl?

Well\, I am a little bit reluctant to call it "proven" but the AFS XS package is working with the non-threaded perl for nearly ten years now.

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Hi Nick\,

On Tue\, Apr 13 2004\, Nick Ing-Simmons wrote​:

Norbert Gruener \nog@&#8203;MPA\-Garching\.MPG\.DE writes​:

It is the function "savecontext". I have attached a modified version of this function containing several test prints.

And these are the test outputs with "unthreaded perl" with "threaded perl"

LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5
[snipped several lines of debug output] LWP2-SaveContext-DEBUG-5-1 Segmentation fault

As you can see\, the call for "longjmp" is crashing in "threaded". And there is definitely not a problem in the function "savecontext" since this function is used many times in the OpenAFS system without any problems.

Perl uses longjmp too so that should not itself be a problem.

Enabling threads in perl does two things (mainly) as far as XS code is concerned​:

1. Changes #define-s so that perl variables are accessed via [snipped several lines of explanation] Snags with this show should show up at compile time.

2. Links with special "threading" versions of system libraries. This is most likely cause of the problem you are seeing. On linux this basically means libpthread.so gets used\, and so now you are using pthread's version of longjmp().

this makes complete sense to me. But what is the conclusion then? Is "libpthread.so" buggy\, or one of the "OpenAFS" system libraries\, or is it just an incompatible interconnection of "threaded Perl" and "OpenAFS" which is surely doing some threadening?

At the moment I don't have any idea which direction I should go.

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Hi Liz\,

On Tue\, Apr 13 2004\, Elizabeth Mattijsen wrote​:

At 15​:42 +0200 4/13/04\, Norbert Gruener wrote​:

I have attached the output.

After the line "LWP2-SaveContext-DEBUG-5-5" normally the test crashes. Now there are some lines of "messages". I don't know if you can interpret them.

Well\, maybe​:

==32696== ==32696== Invalid write of size 4 ==32696== at 0x41C38654​: savecontext (in /afs/mpa/home/nog/AFS.short/src/blib/arch/auto/AFS/AFS.so) ==32696== Address 0x41A3602C is on thread 1's stack ==32696== ==32696== Invalid read of size 4 ==32696== at 0x402DAEE9​: _IO_puts (in /lib/libc-2.3.2.so) ==32696== Address 0x41A3602C is on thread 1's stack

I understand from your problem description that you only used a threaded Perl\, but not actually start any threads\, right?

That is correct.

                                                        This 

message implies to me that there are multiple threads running (don't thread numbers start at 0?).

I really don't know.

The only thing which makes me a little bit sceptical is the output of a "valgrind" run against the equivalent OpenAFS command.

I have attached the output of the equivalent OpenAFS command. The only difference to the perl version is the entry point of the program. As you can see from the test prints\, both programs are running through the same test prints. Both outputs are nearly identical. The Perl version seg-faults and the OpenAFS is working fine.

So\, at the moment I am absolutely clueless :-((

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

~/AFS.short>/tmp/local/bin/valgrind /tmp/openafs/sbin/vos exa home nog ==763== Memcheck\, a.k.a. Valgrind\, a memory error detector for x86-linux. ==763== Copyright (C) 2002-2003\, and GNU GPL'd\, by Julian Seward. ==763== Using valgrind-2.0.0\, a program supervision framework for x86-linux. ==763== Copyright (C) 2000-2003\, and GNU GPL'd\, by Julian Seward. ==763== Estimated CPU clock rate is 1608 MHz ==763== For more details\, rerun with​: -v ==763== VOS-DEBUG-1 VOS-DEBUG-2 VOS-DEBUG-3 VSU-DEBUG-1 RX-DEBUG-1 RX-DEBUG-2 RX-DEBUG-3 RX-DEBUG-4 RX-DEBUG-5 RX-DEBUG-6 RX-DEBUG-7 RX-LWP-Init-Thread-DEBUG-1 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 RX-LWP-Init-Thread-DEBUG-2 IOMGR-Init-DEBUG-1 IOMGR-Init-DEBUG-2 IOMGR-Init-DEBUG-3 IOMGR-Init-DEBUG-4 IOMGR-Init-DEBUG-5 IOMGR-Init-DEBUG-8 LWP1-Create-Proc-DEBUG-1 LWP1-Create-Proc-DEBUG-2 LWP1-Create-Proc-DEBUG-3 LWP1-Create-Proc-DEBUG-4 LWP1-Create-Proc-DEBUG-5 LWP1-Create-Proc-DEBUG-6 LWP1-Create-Proc-DEBUG-7 LWP1-Create-Proc-DEBUG-9 LWP1-Create-Proc-DEBUG-10 LWP1-Create-Proc-DEBUG-11 ==763== Invalid write of size 1 ==763== at 0x808CCF0​: (within /tmp/openafs/sbin/vos) ==763== by 0x808BED1​: (within /tmp/openafs/sbin/vos) ==763== by 0x808D991​: (within /tmp/openafs/sbin/vos) ==763== by 0x808B3BD​: (within /tmp/openafs/sbin/vos) ==763== Address 0x411802FC is 0 bytes after a block of size 196608 alloc'd ==763== at 0x4002AA2D​: malloc (vg_replace_malloc.c​:153) ==763== by 0x808BE63​: (within /tmp/openafs/sbin/vos) ==763== by 0x808D991​: (within /tmp/openafs/sbin/vos) ==763== by 0x808B3BD​: (within /tmp/openafs/sbin/vos) LWP1-Create-Proc-DEBUG-12 LWP1-Create-Proc-DEBUG-13 LWP1-Create-Proc-DEBUG-18 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-5 LWP2-SaveContext-DEBUG-5-1 LWP2-SaveContext-DEBUG-5-2 LWP2-SaveContext-DEBUG-5-3 LWP2-SaveContext-DEBUG-5-4 LWP2-SaveContext-DEBUG-5-5 ==763== ==763== Invalid write of size 4 ==763== at 0x808CEC4​: (within /tmp/openafs/sbin/vos) ==763== Address 0x411802FC is on thread 1's stack ==763== ==763== Invalid read of size 4 ==763== at 0x402AEEE9​: _IO_puts (in /lib/libc-2.3.2.so) ==763== Address 0x411802FC is on thread 1's stack LWP2-SaveContext-DEBUG-5-1 LWP1-Create-Proc2-DEBUG-1 LWP1-Create-Proc2-DEBUG-2 LWP2-SaveContext-DEBUG-1 LWP2-SaveContext-DEBUG-2 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP-Dispatcher-DEBUG-3 LWP-Dispatcher-DEBUG-4 LWP-Dispatcher-DEBUG-7 LWP-Dispatcher-DEBUG-8 LWP-Dispatcher-DEBUG-9 LWP-Dispatcher-DEBUG-11 LWP-Dispatcher-DEBUG-12 LWP2-SaveContext-DEBUG-3 LWP2-SaveContext-DEBUG-4 LWP2-SaveContext-DEBUG-6 LWP2-SaveContext-DEBUG-7 LWP1-Create-Proc-DEBUG-19 ==763== ==763== ERROR SUMMARY​: 6 errors from 3 contexts (suppressed​: 0 from 0) ==763== malloc/free​: in use at exit​: 250063 bytes in 676 blocks. ==763== malloc/free​: 676 allocs\, 0 frees\, 250063 bytes allocated. ==763== For a detailed leak analysis\, rerun with​: --leak-check=yes ==763== For counts of detected errors\, rerun with​: -v

p5pRT commented 20 years ago

From nog@MPA-Garching.MPG.DE

Hi Nick\, hi Dave\, hi Liz\,

On Wed\, Apr 14 2004\, Norbert Gruener wrote​:

On Wed\, Apr 14 2004\, Nick Ing-Simmons wrote​:

2. Links with special "threading" versions of system libraries. This is most likely cause of the problem you are seeing. On linux this basically means libpthread.so gets used\, and so now you are using pthread's version of longjmp().

this statement brought me on the right track. Thank you Nick !!! :-)

The problem is the "pthread" version of longjmp in connection with OpenAFS and its "own threading". After I had understood that completely I could proof that OpenAFS gets the same problem if I force it to use the "pthread" version of longjmp. Then that specific OpenAFS server crashes also at the same statement.

So\, whoever is allowed to modify the status of that request\,

  please close request # 28369

This is not a Perl problem.

I want to thank all of you for your patience and your assistance. You have given me a great help in tracing my problem. I am really proud of the Perl community and especially of you guys.

Thank you\,

Norbert -- Ceterum censeo | PGP encrypted mail preferred. Redmond esse delendam. | PGP Key at www.MPA-Garching.MPG.de/~nog/

p5pRT commented 20 years ago

@tux - Status changed from 'open' to 'resolved'