TRIQS / triqs_0.x

DEPRECATED -- This is the repository of the older versions of TRIQS
Other
11 stars 9 forks source link

memory release in ctqmc_hyb / 1.0 #122

Closed aeantipov closed 11 years ago

aeantipov commented 11 years ago

I'm experiencing an odd behaviour from making tests in 1.0 (as of 8d67c83) version. The setup is Darwin dhcp216.mpipks-dresden.mpg.de 12.2.0 Darwin Kernel Version 12.2.0 and clang is 4.2. Compilation goes smooth, the tests generate errors. These are namely

The following tests FAILED:
     41 - sliding_view_nopy (OTHER_FAULT)
     88 - sliding_view_wpy (SEGFAULT)
    109 - det_manipc1 (OTHER_FAULT)
    110 - block (Failed)
    112 - gf_retw (SEGFAULT)
    113 - gfv2 (Failed)
    122 - h5_example (Failed)
    123 - hdf5_io (Failed)
    124 - gf_init (Failed)
    125 - gf_base_op (Failed)
    126 - dos (Failed)
    127 - pade (Failed)
    128 - wien2k_convert (Failed)
    129 - sumklda_basic (Failed)
    131 - single_site_bethe (Failed)
    132 - cdmft_4_sites (Failed)
    133 - hubbard (Failed)

Looking at pytriqs/solvers/ctqmc_hyb/test/single_site_bethe_output.err I see the following

Starting on 1 Nodes at : 2013-03-05 17:20:02.366011
Average sign: 1
Average sign: 1
[dhcp216:70011] *** Process received signal ***
[dhcp216:70011] Signal: Segmentation fault: 11 (11)
[dhcp216:70011] Signal code:  (0)
[dhcp216:70011] Failing at address: 0x0
[dhcp216:70011] [ 0] 2   libsystem_c.dylib                   0x00007fff910a88ea _sigtramp + 26
[dhcp216:70011] [ 1] 3   ???                                 0x0000000000000003 0x0 + 3
[dhcp216:70011] [ 2] 4   libtriqs_solver_ctqmc.dylib         0x000000010f74b206 _ZN5triqs9det_manip9det_manipIN13Configuration11Delta_ProxyEED2Ev + 22
[dhcp216:70011] [ 3] 5   libtriqs_solver_ctqmc.dylib         0x000000010f7421e1 _ZN13ConfigurationD2Ev + 65
[dhcp216:70011] [ 4] 6   ctqmc_solver.so                     0x000000010f71c33b _ZN5triqs3app16impurity_solvers9ctqmc_hybD2Ev + 139
[dhcp216:70011] [ 5] 7   ctqmc_solver.so                     0x000000010f70cd69 _ZL67__pyx_pf_7pytriqs_7solvers_9ctqmc_hyb_12ctqmc_solver_6Solver_2SolveP7_objectS0_ + 109017
[dhcp216:70011] [ 6] 8   Python                              0x000000010cd8b4b9 PyObject_Call + 97
[dhcp216:70011] [ 7] 9   Python                              0x000000010ce01fca PyEval_EvalFrameEx + 8459
[dhcp216:70011] [ 8] 10  Python                              0x000000010cdffe84 PyEval_EvalCodeEx + 1857
[dhcp216:70011] [ 9] 11  Python                              0x000000010ce060dc fast_function + 280
[dhcp216:70011] [10] 12  Python                              0x000000010ce01e6d PyEval_EvalFrameEx + 8110
[dhcp216:70011] [11] 13  Python                              0x000000010cdffe84 PyEval_EvalCodeEx + 1857
[dhcp216:70011] [12] 14  Python                              0x000000010cdff73d PyEval_EvalCode + 54
[dhcp216:70011] [13] 15  Python                              0x000000010ce1dde7 run_mod + 53
[dhcp216:70011] [14] 16  Python                              0x000000010ce1deaa PyRun_FileExFlags + 165
[dhcp216:70011] [15] 17  Python                              0x000000010ce1da1c PyRun_SimpleFileExFlags + 777
[dhcp216:70011] [16] 18  Python                              0x000000010ce2e4bd Py_Main + 2909
[dhcp216:70011] [17] 19  libdyld.dylib                       0x00007fff8db787e1 start + 0
[dhcp216:70011] *** End of error message ***
/Users/antipov/code/triqs_triqs/build_1.0_wboost/run_pytriqs_for_test.sh: line 4: 70011 Segmentation fault: 11  /Users/antipov/.virtualenvs/generic/bin/python $@

Which means, the test has gone through and has some memory release issues only in the end. The same is for cdmft_4_sites(132).

aeantipov commented 11 years ago

gdb says for single_site_bethe

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x00007fff8ec1aebe in std::__1::basic_string, std::__1::allocator >::~basic_string ()

for cdmft_4_sites

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
0x000000010374b874 in triqs::arrays::assignment::impl, triqs::arrays::array, (char)69, void>::invoke ()
parcollet commented 11 years ago

On mac ? l will retest it on os X later today... The gf have been changing a lot these days (remember 1.0 is a dev branch, it is not yet 1.0), but it is ok on linux (including valgrind test). Looks to me like a stdc++ link pb... Which version of compiler ? stdc++ ? ...

parcollet commented 11 years ago

Is it Mountain Lion ?

aeantipov commented 11 years ago

It is. clang :

Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.2.0
Thread model: posix
libs:

pytriqs/solvers/ctqmc_hyb/ctqmc_solver.so:
    /usr/local/Frameworks/Python.framework/Versions/2.7/Python (compatibility version 2.7.0, current version 2.7.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 744.12.0)
    /usr/local/lib/libfftw3.3.dylib (compatibility version 7.0.0, current version 7.2.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/foreignlibs/boost/libboost_for_triqs.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/lib/libmpi_cxx.1.dylib (compatibility version 2.0.0, current version 2.1.0)
    /usr/local/lib/libmpi.1.dylib (compatibility version 2.0.0, current version 2.3.0)
    /usr/local/lib/libsz.2.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/local/lib/libhdf5_hl.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /usr/local/lib/libhdf5.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /System/Library/Frameworks/vecLib.framework/Versions/A/vecLib (compatibility version 1.0.0, current version 380.6.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/triqs/libtriqs.dylib (compatibility version 0.0.0, current version 0.0.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/applications/impurity_solvers/ctqmc_hyb/libtriqs_solver_ctqmc.dylib (compatibility version 0.0.0, current version 0.0.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/triqs/libtriqs_utility.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/lib/libhdf5_cpp.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /usr/local/lib/libgmp.10.dylib (compatibility version 11.0.0, current version 11.5.0)
    /usr/local/lib/libgmpxx.4.dylib (compatibility version 7.0.0, current version 7.5.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 65.1.0)
parcollet commented 11 years ago

Let's concentrate on a pure C++ example, like gfv2. HEre on linux, valgrind says "0 errors", so does clang -fsanitize=address so I guess there is no obvious error ...

aeantipov commented 11 years ago
(gdb) run ./triqs/gf/test/gfv2
Starting program: /Users/antipov/code/triqs_triqs/build_1.0_wboost/triqs/gf/test/gfv2 ./triqs/gf/test/gfv2
Reading symbols for shared libraries ++++++++++++++++++................................... done
(G( 0)) ---> 
[[(0,0),(0,0)]
 [(0,0),(0,0)]]
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000101801390
0x000000010000642f in main ()

If needed

(generic)antipov@r2d2_v2 ~/code/triqs_triqs/build_1.0_wboost $ otool -L ./triqs/gf/test/gfv2
./triqs/gf/test/gfv2:
    /System/Library/Frameworks/vecLib.framework/Versions/A/vecLib (compatibility version 1.0.0, current version 380.6.0)
    /usr/local/Frameworks/Python.framework/Versions/2.7/Python (compatibility version 2.7.0, current version 2.7.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/foreignlibs/boost/libboost_for_triqs.dylib (compatibility version 0.0.0, current version 0.0.0)
    /usr/local/lib/libmpi_cxx.1.dylib (compatibility version 2.0.0, current version 2.1.0)
    /usr/local/lib/libmpi.1.dylib (compatibility version 2.0.0, current version 2.3.0)
    /usr/local/lib/libsz.2.0.0.dylib (compatibility version 3.0.0, current version 3.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/local/lib/libhdf5_hl.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /usr/local/lib/libhdf5.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/triqs/libtriqs_utility.dylib (compatibility version 0.0.0, current version 0.0.0)
    /Users/antipov/code/triqs_triqs/build_1.0_wboost/triqs/libtriqs.dylib (compatibility version 0.0.0, current version 0.0.0)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 744.12.0)
    /usr/local/lib/libfftw3.3.dylib (compatibility version 7.0.0, current version 7.2.0)
    /usr/local/lib/libhdf5_cpp.7.dylib (compatibility version 8.0.0, current version 8.3.0)
    /usr/local/lib/libgmp.10.dylib (compatibility version 11.0.0, current version 11.5.0)
    /usr/local/lib/libgmpxx.4.dylib (compatibility version 7.0.0, current version 7.5.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 65.1.0)
parcollet commented 11 years ago

Ok, I have the same on my new mac. I have localized the pb (due to an ellipsis of size 0 in the new function) but I want to understand why the same code with clang on 2 platforms give different results ...

parcollet commented 11 years ago

Fixed in 3c3e717c3a480386f6e5ecf2ee3ea65277509b34