Reference-LAPACK / lapack

LAPACK development repository
Other
1.51k stars 441 forks source link

ppc64le - Segfault during test: `./xeigtstz < nep.in > znep.out 2>&1` #85

Closed jlost closed 3 years ago

jlost commented 8 years ago

Hi, I built lapack with xlf on a ppc64le (POWER8) machine. I used the options in INSTALL\make.inc.XLF, although for my BLASLIB, I switched to ../../librefblas.a.

During make, I encountered the following error:

** zunt01   === End of Compilation 1 ===
"zunt01.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program.  Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510  Compilation successful for file zunt01.f.
xlf -O3 -qfixed -qnosave -c zunt03.f -o zunt03.o
** zunt03   === End of Compilation 1 ===
"zunt03.f", 1500-036 (I) The NOSTRICT option (default at OPT(3)) has the potential to alter the semantics of a program.  Please refer to documentation on the STRICT/NOSTRICT option for more information.
1501-510  Compilation successful for file zunt03.f.
\
          xlf -qnosave -o xeigtstz \
          zchkee.o zbdt01.o zbdt02.o zbdt03.o zbdt05.o zchkbb.o zchkbd.o zchkbk.o zchkbl.o zchkec.o zchkgg.o zchkgk.o zchkgl.o zchkhb.o zchkhs.o zchkst.o zckcsd.o zckglm.o zckgqr.o zckgsv.o zcklse.o zcsdts.o zdrges.o zdrgev.o zdrges3.o zdrgev3.o zdrgsx.o zdrgvx.o zdrvbd.o zdrves.o zdrvev.o zdrvsg.o zdrvst.o zdrvsx.o zdrvvx.o zerrbd.o zerrec.o zerred.o zerrgg.o zerrhs.o zerrst.o zget02.o zget10.o zget22.o zget23.o zget24.o zget35.o zget36.o zget37.o zget38.o zget51.o zget52.o zget54.o zglmts.o zgqrts.o zgrqts.o zgsvts3.o zhbt21.o zhet21.o zhet22.o zhpt21.o zhst01.o zlarfy.o zlarhs.o zlatm4.o zlctes.o zlctsx.o zlsets.o zsbmv.o zsgt01.o zslect.o zstt21.o zstt22.o zunt01.o zunt03.o dlafts.o dlahd2.o dlasum.o dlatb9.o dstech.o dstect.o dsvdch.o dsvdct.o dsxt1.o alahdg.o alasum.o alasvm.o alareq.o ilaenv.o xerbla.o xlaenv.o chkxer.o ../../libtmglib.a \
          ../../liblapack.a ../../librefblas.a && mv xeigtstz ../xeigtstz
make[2]: Leaving directory `/home/u0017592/projects/lapack/TESTING/EIG'
NEP: Testing Nonsymmetric Eigenvalue Problem routines
./xeigtstz < nep.in > znep.out 2>&1
/bin/sh: line 1: 29872 Segmentation fault      ./xeigtstz < nep.in > znep.out 2>&1
make[1]: *** [znep.out] Error 139
make[1]: Leaving directory `/home/u0017592/projects/lapack/TESTING'
make: *** [lapack_testing] Error 2

Any idea what the problem might be? Is reference lapack tested on ppc64le architecture?

jlost commented 8 years ago

FWIW, the problem also occurred with gfortran:

cd EIG ; make complex16
make[2]: Entering directory `/home/u0017592/projects/lapack/TESTING/EIG'
gfortran -O2 -frecursive -c zchkee.f -o zchkee.o
gfortran -O2 -frecursive -c zbdt01.f -o zbdt01.o
gfortran -O2 -frecursive -c zbdt02.f -o zbdt02.o
gfortran -O2 -frecursive -c zbdt03.f -o zbdt03.o
gfortran -O2 -frecursive -c zbdt05.f -o zbdt05.o
gfortran -O2 -frecursive -c zchkbb.f -o zchkbb.o
gfortran -O2 -frecursive -c zchkbd.f -o zchkbd.o
gfortran -O2 -frecursive -c zchkbk.f -o zchkbk.o
gfortran -O2 -frecursive -c zchkbl.f -o zchkbl.o
gfortran -O2 -frecursive -c zchkec.f -o zchkec.o
gfortran -O2 -frecursive -c zchkgg.f -o zchkgg.o
gfortran -O2 -frecursive -c zchkgk.f -o zchkgk.o
gfortran -O2 -frecursive -c zchkgl.f -o zchkgl.o
gfortran -O2 -frecursive -c zchkhb.f -o zchkhb.o
gfortran -O2 -frecursive -c zchkhs.f -o zchkhs.o
gfortran -O2 -frecursive -c zchkst.f -o zchkst.o
gfortran -O2 -frecursive -c zckcsd.f -o zckcsd.o
gfortran -O2 -frecursive -c zckglm.f -o zckglm.o
gfortran -O2 -frecursive -c zckgqr.f -o zckgqr.o
gfortran -O2 -frecursive -c zckgsv.f -o zckgsv.o
gfortran -O2 -frecursive -c zcklse.f -o zcklse.o
gfortran -O2 -frecursive -c zcsdts.f -o zcsdts.o
gfortran -O2 -frecursive -c zdrges.f -o zdrges.o
gfortran -O2 -frecursive -c zdrgev.f -o zdrgev.o
gfortran -O2 -frecursive -c zdrges3.f -o zdrges3.o
gfortran -O2 -frecursive -c zdrgev3.f -o zdrgev3.o
gfortran -O2 -frecursive -c zdrgsx.f -o zdrgsx.o
gfortran -O2 -frecursive -c zdrgvx.f -o zdrgvx.o
gfortran -O2 -frecursive -c zdrvbd.f -o zdrvbd.o
gfortran -O2 -frecursive -c zdrves.f -o zdrves.o
gfortran -O2 -frecursive -c zdrvev.f -o zdrvev.o
gfortran -O2 -frecursive -c zdrvsg.f -o zdrvsg.o
gfortran -O2 -frecursive -c zdrvst.f -o zdrvst.o
gfortran -O2 -frecursive -c zdrvsx.f -o zdrvsx.o
gfortran -O2 -frecursive -c zdrvvx.f -o zdrvvx.o
gfortran -O2 -frecursive -c zerrbd.f -o zerrbd.o
gfortran -O2 -frecursive -c zerrec.f -o zerrec.o
gfortran -O2 -frecursive -c zerred.f -o zerred.o
gfortran -O2 -frecursive -c zerrgg.f -o zerrgg.o
gfortran -O2 -frecursive -c zerrhs.f -o zerrhs.o
gfortran -O2 -frecursive -c zerrst.f -o zerrst.o
gfortran -O2 -frecursive -c zget02.f -o zget02.o
gfortran -O2 -frecursive -c zget10.f -o zget10.o
gfortran -O2 -frecursive -c zget22.f -o zget22.o
gfortran -O2 -frecursive -c zget23.f -o zget23.o
gfortran -O2 -frecursive -c zget24.f -o zget24.o
gfortran -O2 -frecursive -c zget35.f -o zget35.o
gfortran -O2 -frecursive -c zget36.f -o zget36.o
gfortran -O2 -frecursive -c zget37.f -o zget37.o
gfortran -O2 -frecursive -c zget38.f -o zget38.o
gfortran -O2 -frecursive -c zget51.f -o zget51.o
gfortran -O2 -frecursive -c zget52.f -o zget52.o
gfortran -O2 -frecursive -c zget54.f -o zget54.o
gfortran -O2 -frecursive -c zglmts.f -o zglmts.o
gfortran -O2 -frecursive -c zgqrts.f -o zgqrts.o
gfortran -O2 -frecursive -c zgrqts.f -o zgrqts.o
gfortran -O2 -frecursive -c zgsvts3.f -o zgsvts3.o
gfortran -O2 -frecursive -c zhbt21.f -o zhbt21.o
gfortran -O2 -frecursive -c zhet21.f -o zhet21.o
gfortran -O2 -frecursive -c zhet22.f -o zhet22.o
gfortran -O2 -frecursive -c zhpt21.f -o zhpt21.o
gfortran -O2 -frecursive -c zhst01.f -o zhst01.o
gfortran -O2 -frecursive -c zlarfy.f -o zlarfy.o
gfortran -O2 -frecursive -c zlarhs.f -o zlarhs.o
gfortran -O2 -frecursive -c zlatm4.f -o zlatm4.o
gfortran -O2 -frecursive -c zlctes.f -o zlctes.o
gfortran -O2 -frecursive -c zlctsx.f -o zlctsx.o
gfortran -O2 -frecursive -c zlsets.f -o zlsets.o
gfortran -O2 -frecursive -c zsbmv.f -o zsbmv.o
gfortran -O2 -frecursive -c zsgt01.f -o zsgt01.o
gfortran -O2 -frecursive -c zslect.f -o zslect.o
gfortran -O2 -frecursive -c zstt21.f -o zstt21.o
gfortran -O2 -frecursive -c zstt22.f -o zstt22.o
gfortran -O2 -frecursive -c zunt01.f -o zunt01.o
gfortran -O2 -frecursive -c zunt03.f -o zunt03.o
\
          gfortran  -o xeigtstz \
          zchkee.o zbdt01.o zbdt02.o zbdt03.o zbdt05.o zchkbb.o zchkbd.o zchkbk.o zchkbl.o zchkec.o zchkgg.o zchkgk.o zchkgl.o zchkhb.o zchkhs.o zchkst.o zckcsd.o zckglm.o zckgqr.o zckgsv.o zcklse.o zcsdts.o zdrges.o zdrgev.o zdrges3.o zdrgev3.o zdrgsx.o zdrgvx.o zdrvbd.o zdrves.o zdrvev.o zdrvsg.o zdrvst.o zdrvsx.o zdrvvx.o zerrbd.o zerrec.o zerred.o zerrgg.o zerrhs.o zerrst.o zget02.o zget10.o zget22.o zget23.o zget24.o zget35.o zget36.o zget37.o zget38.o zget51.o zget52.o zget54.o zglmts.o zgqrts.o zgrqts.o zgsvts3.o zhbt21.o zhet21.o zhet22.o zhpt21.o zhst01.o zlarfy.o zlarhs.o zlatm4.o zlctes.o zlctsx.o zlsets.o zsbmv.o zsgt01.o zslect.o zstt21.o zstt22.o zunt01.o zunt03.o dlafts.o dlahd2.o dlasum.o dlatb9.o dstech.o dstect.o dsvdch.o dsvdct.o dsxt1.o alahdg.o alasum.o alasvm.o alareq.o ilaenv.o xerbla.o xlaenv.o chkxer.o ../../libtmglib.a \
          ../../liblapack.a ../../librefblas.a && mv xeigtstz ../xeigtstz
make[2]: Leaving directory `/home/u0017592/projects/lapack/TESTING/EIG'
NEP: Testing Nonsymmetric Eigenvalue Problem routines
./xeigtstz < nep.in > znep.out 2>&1
/bin/sh: line 1:  6859 Segmentation fault      ./xeigtstz < nep.in > znep.out 2>&1
make[1]: *** [znep.out] Error 139
make[1]: Leaving directory `/home/u0017592/projects/lapack/TESTING'
make: *** [lapack_testing] Error 2
jlost commented 8 years ago

Also occurs with -O0 (no opt). Here are the complete make logs: make.noopt.txt

edelsohn commented 7 years ago

$200 bounty open to fix this! https://www.bountysource.com/issues/39160760-ppc64le-segfault-during-test-xeigtstz-nep-in-znep-out-2-1

victorliu commented 7 years ago

On a typical Amazon AWS I can compile lapack with -O0 and -frecursive, and have the compilation and tests run all the way through without a problem. However, when I go back into the TESTING directory and try to run valgrind or gdb on xeigtstz, it segfaults within zchkee.f at line 1134, upon trying to initialize a gigantic local array.

Recompiling without -frecursive, I no longer get the segfault.

My theory: -frecursive forces local arrays to be allocated on the stack (avoiding static variables is required for thread safety; I was the one recommending this flag originally). The static arrays within the test routines blow out the stack size (just the A array in zchkee is almost 1 MB). You may need to try increasing the stack size to make the tests run correctly if you compile with -frecursive.

Edit: I should also add that valgrind finds thousands of illegal memory accesses in the main test drivers as well. I don't know enough about fortran WRITE and READ statements to know if these are spurious errors or not.

jlost commented 7 years ago

gfortran succeeds compilation without -frecursive and fails with it. I'm most concerned with xlf, though - I also tried disabling -qnosave with xlf but no luck. Setting stack size to unlimited like so: ulimit -s unlimited with -qnosave disabled

Resulted in:

** slaord   === End of Compilation 1 ===
1501-510  Compilation successful for file slaord.f.
xlf   aladhd.o alaerh.o alaesm.o alahd.o alareq.o alasum.o alasvm.o chkxer.o icopy.o ilaenv.o xlaenv.o xerbla.o slaord.o schkaa.o schkeq.o schkgb.o schkge.o schkgt.o schklq.o schkpb.o schkpo.o schkps.o schkpp.o schkpt.o schkq3.o schkql.o schkqr.o schkrq.o schksp.o schksy.o schksy_rook.o schksy_aa.o schktb.o schktp.o schktr.o schktz.o sdrvgt.o sdrvls.o sdrvpb.o sdrvpp.o sdrvpt.o sdrvsp.o  sdrvsy_rook.o sdrvsy_aa.o serrgt.o serrlq.o serrls.o serrps.o serrql.o serrqp.o serrqr.o serrrq.o serrtr.o serrtz.o sgbt01.o sgbt02.o sgbt05.o sgelqs.o sgeqls.o sgeqrs.o sgerqs.o sget01.o sget02.o sget03.o sget04.o sget06.o sget07.o sgtt01.o sgtt02.o sgtt05.o slaptm.o slarhs.o slatb4.o slatb5.o slattb.o slattp.o slattr.o slavsp.o slavsy.o slavsy_rook.o slqt01.o slqt02.o slqt03.o spbt01.o spbt02.o spbt05.o spot01.o spot02.o spot03.o spot05.o spst01.o sppt01.o sppt02.o sppt03.o sppt05.o sptt01.o sptt02.o sptt05.o sqlt01.o sqlt02.o sqlt03.o sqpt01.o sqrt01.o sqrt01p.o sqrt02.o sqrt03.o sqrt11.o sqrt12.o sqrt13.o sqrt14.o sqrt15.o sqrt16.o sqrt17.o srqt01.o srqt02.o srqt03.o srzt01.o srzt02.o sspt01.o ssyt01.o ssyt01_rook.o ssyt01_aa.o stbt02.o stbt03.o stbt05.o stbt06.o stpt01.o stpt02.o stpt03.o stpt05.o stpt06.o strt01.o strt02.o strt03.o strt05.o strt06.o sgennd.o sqrt04.o sqrt05.o schkqrt.o serrqrt.o schkqrtp.o serrqrtp.o schklqt.o schklqtp.o schktsqr.o serrlqt.o serrlqtp.o serrtsqr.o stsqr01.o slqt04.o slqt05.o  serrvx.o sdrvge.o sdrvsy.o serrge.o sdrvgb.o sdrvpo.o serrsy.o serrpo.o \
        ../../libtmglib.a ../../liblapack.a   ../../librefblas.a -o xlintsts
mv xlintsts ../xlintsts
make[2]: Leaving directory `/home/u0017592/projects/lapack/TESTING/LIN'
Testing REAL LAPACK linear equation routines
./xlintsts < stest.in > stest.out 2>&1
/bin/sh: line 1: 10399 Killed                  ./xlintsts < stest.in > stest.out 2>&1
make[1]: *** [stest.out] Error 137
make[1]: Leaving directory `/home/u0017592/projects/lapack/TESTING'
make: *** [lapack_testing] Error 2

Edited to make clear I was running with -qnosave disabled.

victorliu commented 7 years ago

I just re-compiled using gfortran with "-frecursive -fcheck=all -O0 -ggdb". Running "make" causes the tests to fail on each test of the Aasen symmetric indefinite routines. This happens for all four precisions. Running each test manually within gdb results in:

xlintsts < stest.in:

At line 201 of file ssyt01_aa.f
Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

xlintstc < ctest.in:

At line 206 of file chet01_aa.f
Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

./xlintstd < dtest.in

At line 202 of file dsyt01_aa.f
Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

xlintstz < ztest.in:

At line 206 of file zhet01_aa.f
Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

Digging into the stacktrace, it appears the bottom level function (the files mentioned above), the matrix C is accessed at index (2,1) whereas N=LDC=1 in the routine. This is not necessarily an error since the called BLAS function performs an early exit (N=0).

Full stack trace for single precision test:

#0  ssyt01_aa (uplo=..., n=1, a=..., lda=1, afac=..., ldafac=1, ipiv=..., 
    c=..., ldc=1, rwork=..., resid=0, _uplo=1) at ssyt01_aa.f:201
#1  0x000000000044e6c5 in schksy_aa (dotype=..., nn=7, nval=..., nnb=3, 
    nbval=..., nns=3, nsval=..., thresh=30, tsterr=.TRUE., nmax=132, a=..., 
    afac=..., ainv=..., b=..., x=..., xact=..., work=..., rwork=..., 
    iwork=..., nout=6) at schksy_aa.f:477
#2  0x00000000004248af in schkaa () at schkaa.f:710
#3  0x0000000000426927 in main (argc=1, argv=0x7fffffffe893) at schkaa.f:1015
#4  0x00007ffff7200f45 in __libc_start_main (main=0x4268f3 <main>, argc=1, 
    argv=0x7fffffffe688, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffe678) at libc-start.c:287
#5  0x00000000004010f9 in _start ()

More oddities involving the Aasen tests show up when running valgrind on "xlintsts < stest.in". Is it possible for you to run the test routine under your debugger (whatever it would be corresponding to xlf)?

 SSA drivers passed the tests of the error exits
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x465889: sget04_ (sget04.f:148)
==19821==    by 0x453675: sdrvsy_aa_ (sdrvsy_aa.f:482)
==19821==    by 0x41FB0B: MAIN__ (schkaa.f:719)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x465A5B: sget04_ (sget04.f:169)
==19821==    by 0x453675: sdrvsy_aa_ (sdrvsy_aa.f:482)
==19821==    by 0x41FB0B: MAIN__ (schkaa.f:719)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 

...

==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49A8ED: serrtsqr_ (serrtsqr.f:143)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49A9A3: serrtsqr_ (serrtsqr.f:146)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AA59: serrtsqr_ (serrtsqr.f:149)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AB0F: serrtsqr_ (serrtsqr.f:152)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49ABC5: serrtsqr_ (serrtsqr.f:155)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x557566: sgemqr_ (sgemqr.f:205)
==19821==    by 0x49ABC5: serrtsqr_ (serrtsqr.f:155)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AC7B: serrtsqr_ (serrtsqr.f:158)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x557566: sgemqr_ (sgemqr.f:205)
==19821==    by 0x49AC7B: serrtsqr_ (serrtsqr.f:158)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AD31: serrtsqr_ (serrtsqr.f:161)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49ADE7: serrtsqr_ (serrtsqr.f:164)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AE9D: serrtsqr_ (serrtsqr.f:167)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49AF53: serrtsqr_ (serrtsqr.f:170)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x55753A: sgemqr_ (sgemqr.f:204)
==19821==    by 0x49B009: serrtsqr_ (serrtsqr.f:173)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558718: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49B8DC: serrtsqr_ (serrtsqr.f:223)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558724: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49B8DC: serrtsqr_ (serrtsqr.f:223)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558718: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49B992: serrtsqr_ (serrtsqr.f:226)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558724: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49B992: serrtsqr_ (serrtsqr.f:226)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558718: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49BA48: serrtsqr_ (serrtsqr.f:229)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558724: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49BA48: serrtsqr_ (serrtsqr.f:229)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558718: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49BAFE: serrtsqr_ (serrtsqr.f:232)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558724: sgemlq_ (sgemlq.f:223)
==19821==    by 0x49BAFE: serrtsqr_ (serrtsqr.f:232)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
==19821== Conditional jump or move depends on uninitialised value(s)
==19821==    at 0x558761: sgemlq_ (sgemlq.f:227)
==19821==    by 0x49BAFE: serrtsqr_ (serrtsqr.f:232)
==19821==    by 0x497AFB: schktsqr_ (schktsqr.f:162)
==19821==    by 0x4218DE: MAIN__ (schkaa.f:973)
==19821==    by 0x421A24: main (schkaa.f:1015)
==19821== 
 STS routines passed the tests of the error exits

 All tests for STS routines passed the threshold (  10800 tests run)

 End of tests
 Total time used =       243.69 seconds

==19821== 
==19821== HEAP SUMMARY:
==19821==     in use at exit: 0 bytes in 0 blocks
==19821==   total heap usage: 29,037 allocs, 29,037 frees, 60,537,543 bytes allocated
==19821== 
==19821== All heap blocks were freed -- no leaks are possible
==19821== 
==19821== For counts of detected and suppressed errors, rerun with: -v
==19821== Use --track-origins=yes to see where uninitialised values come from
==19821== ERROR SUMMARY: 30 errors from 24 contexts (suppressed: 0 from 0)
iyamazaki commented 7 years ago

Thank you,

I've fixed some of these errors last week but will make sure to fix them in the next pull request (hopefully tomorrow).

Best, Ichi

On Sun, Nov 20, 2016 at 3:23 PM, Victor Liu notifications@github.com wrote:

I just re-compiled using gfortran with "-frecursive -fcheck=all -O0 -ggdb". Running "make" causes the tests to fail on each test of the Aasen symmetric indefinite routines. This happens for all four precisions. Running each test manually within gdb results in:

xlintsts < stest.in:

At line 201 of file ssyt01_aa.f Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

xlintstc < ctest.in:

At line 206 of file chet01_aa.f Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

./xlintstd < dtest.in

At line 202 of file dsyt01_aa.f Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

xlintstz < ztest.in:

At line 206 of file zhet01_aa.f Fortran runtime error: Index '2' of dimension 1 of array 'c' above upper bound of 1

Digging into the stacktrace, it appears the bottom level function (the files mentioned above), the matrix C is accessed at index (2,1) whereas N=LDC=1 in the routine. This is not necessarily an error since the called BLAS function performs an early exit (N=0).

Full stack trace for single precision test:

0 ssyt01_aa (uplo=..., n=1, a=..., lda=1, afac=..., ldafac=1, ipiv=...,

c=..., ldc=1, rwork=..., resid=0, _uplo=1) at ssyt01_aa.f:201

1 0x000000000044e6c5 in schksy_aa (dotype=..., nn=7, nval=..., nnb=3,

nbval=..., nns=3, nsval=..., thresh=30, tsterr=.TRUE., nmax=132, a=...,
afac=..., ainv=..., b=..., x=..., xact=..., work=..., rwork=...,
iwork=..., nout=6) at schksy_aa.f:477

2 0x00000000004248af in schkaa () at schkaa.f:710

3 0x0000000000426927 in main (argc=1, argv=0x7fffffffe893) at schkaa.f:1015

4 0x00007ffff7200f45 in __libc_start_main (main=0x4268f3
, argc=1,

argv=0x7fffffffe688, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffe678) at libc-start.c:287

5 0x00000000004010f9 in _start ()

More oddities involving the Aasen tests show up when running valgrind on "xlintsts < stest.in". Is it possible for you to run the test routine under your debugger (whatever it would be corresponding to xlf)?

SSA drivers passed the tests of the error exits ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x465889: sget04_ (sget04.f:148) ==19821== by 0x453675: sdrvsyaa (sdrvsy_aa.f:482) ==19821== by 0x41FB0B: MAIN (schkaa.f:719) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x465A5B: sget04_ (sget04.f:169) ==19821== by 0x453675: sdrvsyaa (sdrvsy_aa.f:482) ==19821== by 0x41FB0B: MAIN (schkaa.f:719) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821==

...

==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49A8ED: serrtsqr (serrtsqr.f:143) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49A9A3: serrtsqr (serrtsqr.f:146) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AA59: serrtsqr (serrtsqr.f:149) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AB0F: serrtsqr (serrtsqr.f:152) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49ABC5: serrtsqr (serrtsqr.f:155) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x557566: sgemqr (sgemqr.f:205) ==19821== by 0x49ABC5: serrtsqr (serrtsqr.f:155) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AC7B: serrtsqr (serrtsqr.f:158) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x557566: sgemqr (sgemqr.f:205) ==19821== by 0x49AC7B: serrtsqr (serrtsqr.f:158) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AD31: serrtsqr (serrtsqr.f:161) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49ADE7: serrtsqr (serrtsqr.f:164) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AE9D: serrtsqr (serrtsqr.f:167) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49AF53: serrtsqr (serrtsqr.f:170) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x55753A: sgemqr (sgemqr.f:204) ==19821== by 0x49B009: serrtsqr (serrtsqr.f:173) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558718: sgemlq (sgemlq.f:223) ==19821== by 0x49B8DC: serrtsqr (serrtsqr.f:223) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558724: sgemlq (sgemlq.f:223) ==19821== by 0x49B8DC: serrtsqr (serrtsqr.f:223) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558718: sgemlq (sgemlq.f:223) ==19821== by 0x49B992: serrtsqr (serrtsqr.f:226) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558724: sgemlq (sgemlq.f:223) ==19821== by 0x49B992: serrtsqr (serrtsqr.f:226) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558718: sgemlq (sgemlq.f:223) ==19821== by 0x49BA48: serrtsqr (serrtsqr.f:229) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558724: sgemlq (sgemlq.f:223) ==19821== by 0x49BA48: serrtsqr (serrtsqr.f:229) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558718: sgemlq (sgemlq.f:223) ==19821== by 0x49BAFE: serrtsqr (serrtsqr.f:232) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN_ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558724: sgemlq (sgemlq.f:223) ==19821== by 0x49BAFE: serrtsqr (serrtsqr.f:232) ==19821== by 0x497AFB: schktsqr (schktsqr.f:162) ==19821== by 0x4218DE: MAIN (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== ==19821== Conditional jump or move depends on uninitialised value(s) ==19821== at 0x558761: sgemlq (sgemlq.f:227) ==19821== by 0x49BAFE: serrtsqr (serrtsqr.f:232) ==19821== by 0x497AFB: schktsqr_ (schktsqr.f:162) ==19821== by 0x4218DE: MAIN__ (schkaa.f:973) ==19821== by 0x421A24: main (schkaa.f:1015) ==19821== STS routines passed the tests of the error exits

All tests for STS routines passed the threshold ( 10800 tests run)

End of tests Total time used = 243.69 seconds

==19821== ==19821== HEAP SUMMARY: ==19821== in use at exit: 0 bytes in 0 blocks ==19821== total heap usage: 29,037 allocs, 29,037 frees, 60,537,543 bytes allocated ==19821== ==19821== All heap blocks were freed -- no leaks are possible ==19821== ==19821== For counts of detected and suppressed errors, rerun with: -v ==19821== Use --track-origins=yes to see where uninitialised values come from ==19821== ERROR SUMMARY: 30 errors from 24 contexts (suppressed: 0 from 0)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Reference-LAPACK/lapack/issues/85#issuecomment-261802894, or mute the thread https://github.com/notifications/unsubscribe-auth/ATZ206Z17c4rK1dnDo-yUmZS-C_ZNjp9ks5rAKxQgaJpZM4KwDwS .

jlost commented 7 years ago

I ran my failing test (./xeigtstz < nep.in) through gdb and saw that it was failing at the very beginning of zchkee.f (line 1035). I did a bit of googling and saw that that was commonly related to array allocations as @victorliu said, so I revisited setting ulimit -s unlimited. This time it worked, because I had not compiled with -qnosave disabled. I ran make again and it completed. However, there are still some failed tests. (I'm still using -O0.)

Here is the relevant part of the test results:

-->  Testing COMPLEX16          Linear Equation routines [ ztest.out ]
  *** On entry to ZLASCL parameter number      4 had an illegal value ***
  *** On entry to ZLASCL parameter number      4 had an illegal value ***
  *** On entry to ZLASCL parameter number      4 had an illegal value ***
  *** On entry to ZLASCL parameter number      4 had an illegal value ***
[...]
  *** On entry to ZLASCL parameter number      4 had an illegal value ***
 -->  Tests passed: 407441
-->  Illegal Error: 2700

-->  Testing COMPLEX16          Mixed Precision linear equation routines [ zctest.out ]
-->  Tests passed: 812

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    882293          0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1279339         0       (0.000%)        0       (0.000%)
COMPLEX                 329049          0       (0.000%)        0       (0.000%)
COMPLEX16               737302          0       (0.000%)        2700    (0.366%)

--> ALL PRECISIONS      3227983         0       (0.000%)        2700    (0.084%)

tl;dr: with -qnosave and ulimit -s unlimited: Compiles successfully, but with test errors. Without -qnosave and ulimit -s unlimited, ./xlintsts < stest.in fails. Always fails if no ulimit -s unlimited.

victorliu commented 7 years ago

Please run gdb with a breakpoint on zlascl.f:209 (or whichever line it is that sets the error return) and provide a backtrace. e.g.:

ubuntu:~/lapack/TESTING$ gdb ./xlintstz
(gdb) b zlascl.f:209
(gdb) r < ztest.in
(gdb) bt
jlost commented 7 years ago

Here are the results of that. I also dumped the frame info.

[u0017592@sys-84329 TESTING]$ gdb ./xlintstz
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.ael7b
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/u0017592/projects/lapack/TESTING/xlintstz...done.
(gdb) b zlascl.f:209
Breakpoint 1 at 0x1018c888: file zlascl.f, line 209.
(gdb) r < ztest.in
Starting program: /home/u0017592/projects/lapack/TESTING/./xlintstz < ztest.in
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/power8/libthread_db.so.1".
 Tests of the COMPLEX*16 LAPACK routines
 LAPACK VERSION 3.6.1

 The following parameter values will be used:
    M   :       0     1     2     3     5    10    50
    N   :       0     1     2     3     5    10    50
    NRHS:       1     2    15
    NB  :       1     3     3     3    20
    NX  :       1     0     5     9     1
    RANK:      30    50    90

 Routines pass computational tests if test ratio is less than   30.00

 Relative machine underflow is taken to be     .222507-307
 Relative machine overflow  is taken to be     .179769+309
 Relative machine precision is taken to be     .111022D-15

 ZGE routines passed the tests of the error exits

 All tests for ZGE routines passed the threshold (   3653 tests run)

 ZGE drivers passed the tests of the error exits

 All tests for ZGE drivers  passed the threshold (   5748 tests run)

 ZGB routines passed the tests of the error exits

 All tests for ZGB routines passed the threshold (  28938 tests run)

 ZGB drivers passed the tests of the error exits

 All tests for ZGB drivers  passed the threshold (  36567 tests run)

 ZGT routines passed the tests of the error exits

 All tests for ZGT routines passed the threshold (   2694 tests run)

 ZGT drivers passed the tests of the error exits

 All tests for ZGT drivers  passed the threshold (   2033 tests run)

 ZPO routines passed the tests of the error exits

 All tests for ZPO routines passed the threshold (   1628 tests run)

 ZPO drivers passed the tests of the error exits

 All tests for ZPO drivers  passed the threshold (   1910 tests run)

 ZPS routines passed the tests of the error exits

 All tests for ZPS routines passed the threshold (    150 tests run)

 ZPP routines passed the tests of the error exits

 All tests for ZPP routines passed the threshold (   1332 tests run)

 ZPP drivers passed the tests of the error exits

 All tests for ZPP drivers  passed the threshold (   1910 tests run)

 ZPB routines passed the tests of the error exits

 All tests for ZPB routines passed the threshold (   3458 tests run)

 ZPB drivers passed the tests of the error exits

 All tests for ZPB drivers  passed the threshold (   4750 tests run)

 ZPT routines passed the tests of the error exits

 All tests for ZPT routines passed the threshold (   1778 tests run)

 ZPT drivers passed the tests of the error exits

 All tests for ZPT drivers  passed the threshold (    788 tests run)

 ZHE routines passed the tests of the error exits

 All tests for ZHE routines passed the threshold (   1846 tests run)

 ZHE drivers passed the tests of the error exits

 All tests for ZHE drivers  passed the threshold (   1072 tests run)

 ZHR routines passed the tests of the error exits

 All tests for ZHR routines passed the threshold (   1618 tests run)

 ZHR drivers passed the tests of the error exits

 All tests for ZHR drivers  passed the threshold (    222 tests run)

 ZHK routines passed the tests of the error exits

 All tests for ZHK routines passed the threshold (   1618 tests run)

 ZHK drivers passed the tests of the error exits

 All tests for ZHK drivers  passed the threshold (    222 tests run)

 ZHA routines passed the tests of the error exits

 All tests for ZHA routines passed the threshold (    996 tests run)

 ZHA drivers passed the tests of the error exits

 All tests for ZHA drivers  passed the threshold (    222 tests run)

 ZHP routines passed the tests of the error exits

 All tests for ZHP routines passed the threshold (   1404 tests run)

 ZHP drivers passed the tests of the error exits

 All tests for ZHP drivers  passed the threshold (   1072 tests run)

 ZSY routines passed the tests of the error exits

 All tests for ZSY routines passed the threshold (   2122 tests run)

 ZSY drivers passed the tests of the error exits

 All tests for ZSY drivers  passed the threshold (   1240 tests run)

 ZSR routines passed the tests of the error exits

 All tests for ZSR routines passed the threshold (   1822 tests run)

 ZSR drivers passed the tests of the error exits

 All tests for ZSR drivers  passed the threshold (    258 tests run)

 ZSK routines passed the tests of the error exits

 All tests for ZSK routines passed the threshold (   1822 tests run)

 ZSK drivers passed the tests of the error exits

 All tests for ZSK drivers  passed the threshold (    258 tests run)

 ZSP routines passed the tests of the error exits

 All tests for ZSP routines passed the threshold (   1620 tests run)

 ZSP drivers passed the tests of the error exits

 All tests for ZSP drivers  passed the threshold (   1240 tests run)

 ZTR routines passed the tests of the error exits

 All tests for ZTR routines passed the threshold (   7672 tests run)

 ZTP routines passed the tests of the error exits

 All tests for ZTP routines passed the threshold (   7392 tests run)

 ZTB routines passed the tests of the error exits

 All tests for ZTB routines passed the threshold (  19888 tests run)

 ZQR routines passed the tests of the error exits

 All tests for ZQR routines passed the threshold (  42840 tests run)

 ZRQ routines passed the tests of the error exits

 All tests for ZRQ routines passed the threshold (  28784 tests run)

 ZLQ routines passed the tests of the error exits

 All tests for ZLQ routines passed the threshold (  28784 tests run)

 ZQL routines passed the tests of the error exits

 All tests for ZQL routines passed the threshold (  28784 tests run)

 All tests for ZQ3 routines passed the threshold (   4410 tests run)

 ZTZ routines passed the tests of the error exits

 All tests for ZTZ routines passed the threshold (    252 tests run)

 ZLS routines passed the tests of the error exits

Breakpoint 1, zlascl (type='G', kl=0, ku=0, cfrom=nan(0x8000000000000), cto=1, m=1, n=1, a=..., lda=1, info=0) at zlascl.f:209
209              INFO = -4
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.ael7b.ppc64le libgcc-4.8.3-9.ael7b.ppc64le
(gdb) bt
#0  zlascl (type='G', kl=0, ku=0, cfrom=nan(0x8000000000000), cto=1, m=1, n=1, a=..., lda=1, info=0) at zlascl.f:209
#1  0x00000000100a89f0 in zqrt14 (trans='C', m=1, n=1, nrhs=1, a=..., lda=1, x=..., ldx=1, work=..., lwork=7) at zqrt14.f:200
#2  0x000000001005205c in zdrvls (dotype=..., nm=7, mval=..., nn=7, nval=..., nns=3, nsval=..., nnb=5, nbval=..., nxval=..., thresh=30, tsterr=.TRUE., a=...,
    copya=..., b=..., copyb=..., c=..., s=..., copys=..., work=..., rwork=..., iwork=..., nout=6) at zdrvls.f:522
#3  0x0000000010020db4 in zchkaa () at zchkaa.f:1021
(gdb) info args
type = 'G'
kl = 0
ku = 0
cfrom = nan(0x8000000000000)
cto = 1
m = 1
n = 1
a = ()
lda = 1
info = 0
(gdb) info f 1
Stack frame at 0x3fffffd08e90:
 pc = 0x100a89f0 in zqrt14 (zqrt14.f:200); saved pc 0x1005205c
 called by frame at 0x3fffffd09380, caller of frame at 0x3fffffd08c90
 source language fortran.
 Arglist at 0x3fffffd08c90, args: trans='C', m=1, n=1, nrhs=1, a=..., lda=1, x=..., ldx=1, work=..., lwork=7
 Locals at 0x3fffffd08c90, Previous frame's sp is 0x3fffffd08e90
 Saved registers:
  r29 at 0x3fffffd08e68, r30 at 0x3fffffd08e70, f30 at 0x3fffffd08e80, f31 at 0x3fffffd08e88, pc at 0x3fffffd08ea0, lr at 0x3fffffd08ea0
(gdb) info f 2
Stack frame at 0x3fffffd09380:
 pc = 0x1005205c in zdrvls (zdrvls.f:522); saved pc 0x10020db4
 called by frame at 0x3ffffffff0d0, caller of frame at 0x3fffffd08e90
 source language fortran.
 Arglist at 0x3fffffd08e90, args: dotype=..., nm=7, mval=..., nn=7, nval=..., nns=3, nsval=..., nnb=5, nbval=..., nxval=..., thresh=30, tsterr=.TRUE., a=...,
    copya=..., b=..., copyb=..., c=..., s=..., copys=..., work=..., rwork=..., iwork=..., nout=6
 Locals at 0x3fffffd08e90, Previous frame's sp is 0x3fffffd09380
 Saved registers:
  r23 at 0x3fffffd09330, r24 at 0x3fffffd09338, r25 at 0x3fffffd09340, r26 at 0x3fffffd09348, r27 at 0x3fffffd09350, r28 at 0x3fffffd09358, r29 at 0x3fffffd09360,
  r31 at 0x3fffffd09370, f31 at 0x3fffffd09378, pc at 0x3fffffd09390, lr at 0x3fffffd09390
(gdb) info f 3
Stack frame at 0x3ffffffff0d0:
 pc = 0x10020db4 in zchkaa (zchkaa.f:1021); saved pc 0x100000c54580
 caller of frame at 0x3fffffd09380
 source language fortran.
 Arglist at 0x3fffffd09380, args:
 Locals at 0x3fffffd09380, Previous frame's sp is 0x3ffffffff0d0
 Saved registers:
  r14 at 0x3ffffffff040, r15 at 0x3ffffffff048, r16 at 0x3ffffffff050, r17 at 0x3ffffffff058, r18 at 0x3ffffffff060, r19 at 0x3ffffffff068, r20 at 0x3ffffffff070,
  r21 at 0x3ffffffff078, r22 at 0x3ffffffff080, r23 at 0x3ffffffff088, r24 at 0x3ffffffff090, r25 at 0x3ffffffff098, r26 at 0x3ffffffff0a0, r27 at 0x3ffffffff0a8,
  r28 at 0x3ffffffff0b0, r30 at 0x3ffffffff0c0, r31 at 0x3ffffffff0c8, pc at 0x3ffffffff0e0, lr at 0x3ffffffff0e0
(gdb) continue
Continuing.
 *** On entry to ZLASCL parameter number      4 had an illegal value ***
 *** XERBLA was called with SRNAME = ZLASCL instead of DGETSL ***

Breakpoint 1, zlascl (type='G', kl=0, ku=0, cfrom=nan(0x8000000000000), cto=1, m=1, n=1, a=..., lda=1, info=0) at zlascl.f:209
209              INFO = -4
langou commented 7 years ago

I have a hard time to reproduce this problem. So this is hard to me to make much progress. I can read the information you are posting.

So here is what is happening. (As far as I can see. Thanks James for the gdb, and Victor for suggestions.)

We are in LAPACK/TESTING, so we have a fake XERBLA. (Special for TESTING.) This XERBLA is here so that we can make (on purpose) wrong calls to LAPACK subroutines. (Because we want to check that the error messages are correct.) In the error check, we set what the error message should be and this fake XERBLA checks that the expected error message is indeed the produced error message. This is how we do error checks. All this to say that there is a fake XERBLA.

Anyhow we have a true numerical error to figure out. It seems that, during the numerical test, ZGETSL calls ZLASCL with 4th parameter CFROM = NAN. (And this is NOT OK. And there is a check in ZLASCL for this purpose and this is why the program crashes.)

The fact that you see XERBLA was called with SRNAME = ZLASCL instead of DGETSL is OK. This is because we are working with the fake XERBLA. So XERBLA was set to expect DGETSLS at some point earlier when we were doing error checks. We never resetted this DGETSLS (because XERBLA is not supposed to be called, because we are doing numerical checks, as opposed to error checks.). So XERBLA is still expecting to be called (if it is called) with DGETSLS. (Note: we print only 6 characters, so you see DGETSL but this stands for DGETSLS, and there was a mistake so we should really read ZGETSLS here. Oh my.) So XERBLA says that it is not happy, but we do not really care here. What we care is why XERBLA is called in the first place. XERBLA is called in the first place because during the numerical test, ZGETSL calls ZLASCL with 4th parameter CFROM = NAN. Why? Well I do not know yet.

langou commented 7 years ago

Another related commit at 14f49ebfde6908a959f7bcefbdcb2a95ab68c1f3

langou commented 7 years ago

@jlost : can you please do a git pull and try again and report? I do not have access to xlf. All is well on my mac with gfortran. Cheers, Julien.

jlost commented 7 years ago

Sure, I'll retest in a couple of days when I'm back from vacation.

jlost commented 7 years ago

Same results as above (2700 COMPLEX16 errors) when I compile with xlf and a freshly compiled reference BLAS.

With gfortran, I get 1 error:

-->  Testing COMPLEX16          Nonsymmetric Generalized Eigenvalue Problem driver [ zgd.out ]
 ZDRGEV3: ZGGEV31 returned INFO=     7.
  ZGV drivers:      1 out of   1092 tests failed to pass the threshold
langou commented 7 years ago

Hi all. So I finally got my hands on an IBM machine. (IBM Power8E.) (Thanks @edelsohn , IBM and OSU.) So far, I am trying LAPACK with gfortran (gcc 6.2.1, and standard make.inc) and reference BLAS. You do have to use: ulimit -s unlimited otherwise ./xeigtstz segfaults as reported in the issue. So with gfortran on IBM Power 8E, I have no numerical errors. All tests are successful. I will try (am trying) to install XLF and report some more. Cheers, Julien.

shawnl commented 5 years ago

I can't figure out how to build the test executables..... I run cmake but then it doesn't generate a Makefile with any targets...

weslleyspereira commented 3 years ago

Hello! There are some other open issues related to the same problem, e.g., #276 and #335. I am closing this issue and move the discussion to #335. Thanks to everybody who partially helped solving this issue!