cyclops-community / ctf

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
Other
201 stars 53 forks source link

Testing 3D DFT with n = 6: make: *** [test] Segmentation fault: 11 #24

Closed jeffhammond closed 8 years ago

jeffhammond commented 8 years ago

Is this failure expected?

agraback-mobl1:ctf jrhammon$ cat how-did-i-configure 
./configure 'CXX=/opt/mpich/dev/clang/default/bin/mpicxx' '--blas=-framework Accelerate'
agraback-mobl1:ctf jrhammon$ make test
/Applications/Xcode.app/Contents/Developer/usr/bin/make test_suite -C test
make[1]: Nothing to be done for `test_suite'.
/Users/jrhammon/Work/CHEMISTRY/AQUARIUS/ctf/bin/test_suite
Testing Cyclops Tensor Framework using 1 processors
Testing non-symmetric: NS = NS*NS weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing symmetric: SY = SY*SY weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing (anti-)skew-symmetric: AS = AS*AS weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing symmetric-hollow: SH = SH*SH weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing CCSDT T3->T2 with n= 6, m = 7:
{ AS_C["abij"] += 0.5*AS_A["mnje"]*AS_B["abeimn"] } passed
Testing non-symmetric: NS = NS*NS matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 0 sp 1.000000), B (36*36 sym 0 sp 1.000000), C (36*36 sym 0 sp 1.000000) } passed 
Testing symmetric: SY = SY*SY matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 1 sp 1.000000), B (36*36 sym 1 sp 1.000000), C (36*36 sym 1 sp 1.000000) } passed 
Testing (anti-)skew-symmetric: AS = AS*AS matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 2 sp 1.000000), B (36*36 sym 2 sp 1.000000), C (36*36 sym 2 sp 1.000000) } passed 
Testing symmetric-hollow: SH = SH*SH matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 3 sp 1.000000), B (36*36 sym 3 sp 1.000000), C (36*36 sym 3 sp 1.000000) } passed 
Testing non-symmetric: NS = NS*NS 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing symmetric: SY = SY*SY 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing (anti-)skew-symmetric: AS = AS*AS 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing symmetric-hollow: SH = SH*SH 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing scalar operations
{ scalar tests } passed
Testing a 2D trace operation with n = 6:
{ tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA) } passed
Testing a diag sym operation with n = 6:
{ (A["(ab)(ij)"]=mA["ii"]-mB["aa"]=mA["jj"]-mB["bb"] } passed 
Testing a diag ctr operation with n = 6 m = 36:
{ sum(ai)A["aiai"]=sum(ai)mA["ai"] } passed 
Testing fast symmetric multiplication operation with n = 36:
{ C["(ij)"] = A["(ik)"]*B["(kj)"] } passed
Testing 4D fast symmetric contraction operation with n = 6:
{ C["(ij)ab"] = A["(ik)al"]*B["(kj)lb"] } passed
Testing multi-tensor symmetric contraction with m = 36 n = 6:
{ A["ik"]*A["jk"] = C_NS["ij"] = C_AS["ij"] } passed.
Testing gemm on subworld algorithm with n,m,k = 36 div = 3:
{ GEMM on subworlds } passed
Testing non-symmetric Strassen's algorithm with n = 72:
{ Strassen's algorithm via slicing } passed
Testing diagonal write with n = 6:
{ diagonal write test } passed
Testing readall test with n = 6 m = 36:
{ sum(ai)A["aiai"]=sum(ai)mA["ai"] } passed 
Testing repack with n = 6:
{ NS -> SY -> NS repack } passed 
Testing SY times NS with n = 6:
{ C["(ij)"]=A["(ij)"]*B["ijkl"] } passed 
Testing non-symmetric sliced GEMM algorithm with (16 32 8):
{ GEMM with parallel slicing } passed
Testing 1D DFT with n = 36:
{ DFT["ik"] = DFT["ij"]*IDFT["jk"] } passed
Testing 3D DFT with n = 6:
make: *** [test] Segmentation fault: 11
solomonik commented 8 years ago

Of course not. Please let me know more about the configuration, or try building target 'dft_3D' and running it with valgrind (test_suite runs it with -n 6). dft_3D uses std::complex, so my first guess would be there is an issue that arises with this in your configuration, possibly involving CTF assumptions regarding this type.

jeffhammond commented 8 years ago

I use Mac OS 10.11.6.

MPI

agraback-mobl1:ctf jrhammon$ /opt/mpich/dev/clang/default/bin/mpichversion 
MPICH Version:      3.2
MPICH Release date: unreleased development copy
MPICH Device:       ch3:nemesis
MPICH configure:    CC=clang CXX=clang++ FC=false F77=false --enable-cxx --disable-fortran --with-pm=hydra --prefix=/opt/mpich/dev/clang/default --enable-cxx --enable-wrapper-rpath --disable-static --enable-shared
MPICH CC:   clang    -O2
MPICH CXX:  clang++   -O2
MPICH F77:  false  
MPICH FC:   false  

Compiler

agraback-mobl1:ctf jrhammon$ /opt/mpich/dev/clang/default/bin/mpicxx -v
mpicxx for MPICH version 3.2
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
clang: warning: argument unused during compilation: '-I /opt/mpich/dev/clang/default/include'

Compile

agraback-mobl1:ctf jrhammon$ make dft_3D
/Applications/Xcode.app/Contents/Developer/usr/bin/make dft_3D -C examples
/opt/mpich/dev/clang/default/bin/mpicxx -std=c++11 -O2 -DOMP_OFF -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1   dft_3D.cxx -o /Users/jrhammon/Work/CHEMISTRY/AQUARIUS/ctf/bin/dft_3D -I../include/ -L/Users/jrhammon/Work/CHEMISTRY/AQUARIUS/ctf/lib -lctf -framework Accelerate 
In file included from dft_3D.cxx:11:
In file included from ../include/ctf.hpp:17:
In file included from ../include/../src/interface/tensor.h:5:
../include/../src/interface/set.h:505:22: warning: format specifies type 'long' but the argument has type 'int64_t'
      (aka 'long long') [-Wformat]
    fprintf(fp,"%ld",((int64_t*)a)[0]);
                ~~~  ^~~~~~~~~~~~~~~~
                %lld
1 warning generated.

Run

agraback-mobl1:ctf jrhammon$ ./bin/dft_3D
Segmentation fault: 11

Debug

agraback-mobl1:ctf jrhammon$ lldb ./bin/dft_3D
(lldb) target create "./bin/dft_3D"
Current executable set to './bin/dft_3D' (x86_64).
(lldb) run
Process 70962 launched: './bin/dft_3D' (x86_64)
Process 70962 stopped
* thread #1: tid = 0x2a628f, 0x0000000100004b65 dft_3D`CTF::Semiring<std::__1::complex<long double>, false>::mul(char const*, char const*, char*) const + 37, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000100004b65 dft_3D`CTF::Semiring<std::__1::complex<long double>, false>::mul(char const*, char const*, char*) const + 37
dft_3D`CTF::Semiring<std::__1::complex<long double>, false>::mul:
->  0x100004b65 <+37>: movaps (%rdx), %xmm0
    0x100004b68 <+40>: movaps 0x10(%rdx), %xmm1
    0x100004b6c <+44>: movaps %xmm1, 0x30(%rsp)
    0x100004b71 <+49>: movaps %xmm0, 0x20(%rsp)
(lldb) bt
* thread #1: tid = 0x2a628f, 0x0000000100004b65 dft_3D`CTF::Semiring<std::__1::complex<long double>, false>::mul(char const*, char const*, char*) const + 37, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000100004b65 dft_3D`CTF::Semiring<std::__1::complex<long double>, false>::mul(char const*, char const*, char*) const + 37
    frame #1: 0x000000010006dc8d dft_3D`CTF_int::readwrite(int, long long, char const*, char const*, int, int const*, int const*, int const*, int const*, int const*, int*, char*, char*, char, CTF_int::algstrct const*) + 1357
    frame #2: 0x000000010006f129 dft_3D`CTF_int::wr_pairs_layout(int, int, long long, char const*, char const*, char, int, int const*, int const*, int const*, int const*, int const*, int const*, int*, int const*, char*, char*, CTF_int::CommData, CTF_int::algstrct const*, bool, long long, long long*, char*&, long long&) + 3593
    frame #3: 0x00000001000af479 dft_3D`CTF_int::tensor::write(long long, char const*, char const*, char*, char) + 985
    frame #4: 0x0000000100001db5 dft_3D`CTF::Tensor<std::__1::complex<long double> >::write(long long, long long const*, std::__1::complex<long double> const*) + 229
    frame #5: 0x0000000100000fc6 dft_3D`test_dft_3D(int, CTF::World&) + 1238
    frame #6: 0x0000000100001b4f dft_3D`main + 111
    frame #7: 0x00007fff85b725ad libdyld.dylib`start + 1
    frame #8: 0x00007fff85b725ad libdyld.dylib`start + 1
(lldb) ^D
solomonik commented 8 years ago

Thanks. Yes, I remember now I encountered this before with clang, and just reproduced it again on my laptop. When compiling with -O0, clang++ executes dft_3D just fine (also without errors in valgrind), but when turning on optimizations it crashes. I assume this is a problem with clang optimizations involving `long double'. CTF with clang should still work fine on any other code not involving long double as far as I am aware. So, this test_suite crash can be disregarded if you just want to run Aquarius. I am closing this for now, but if there is suggestions to what CTF can do differently do avoid this or why the problem occurs with clang+optimizations (at least -O1), I'd be happy to address it.

jeffhammond commented 8 years ago

Can we disable the test if Clang is the compiler (#ifdef __clang__) so that one can rely upon make test for validation?

solomonik commented 8 years ago

I think there is value to failing on a code that should work when built with a compiler that doesn't, but I understand this is needed to get automated testing 'working'. I pushed a commit to master that prints a warning instead of running the test when clang is defined.

devinamatthews commented 8 years ago

Does the test fail with non-apple clang or with clang/libstdc++?

On August 6, 2016 1:56:34 PM PDT, Edgar Solomonik notifications@github.com wrote:

I think there is value to failing on a code that should work when built with a compiler that doesn't, but I understand this is needed to get automated testing 'working'. I pushed a commit to master that prints a warning instead of running the test when clang is defined.


You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/solomonik/ctf/issues/24#issuecomment-238048288

devinamatthews commented 8 years ago

It might be better to just drop long double support. You don't gain much precision and you have to use the FPU (and deal with spotty support...). If you really need extended precision, you could do library-based float128 or double-double.

jeffhammond commented 8 years ago

+1

long double is a terrible type. On some platforms it is just 64b, as allowed by the C standard.