G-Node / nix

Neuroscience information exchange format
https://readthedocs.org/projects/nixio/
Other
66 stars 36 forks source link

Warnings found by valgrind #180

Closed balint42 closed 10 years ago

balint42 commented 10 years ago

Running the test run locally many times I regularly encounter the following errors:

*** Error in `./TestRunner': double free or corruption (fasttop): 0x0000000001804530 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80996)[0x7fd8d9385996]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZNSsD1Ev+0x1f)[0x7fd8d999d05f]
/home/balint/projects/pandora/libpandora.so(_ZN3nix4hdf517h5_type_for_valueIdEEN2H58DataTypeEb+0x142)[0x7fd8da4e4109]
/home/balint/projects/pandora/libpandora.so(_ZN3nix4hdf57DataSet16fileTypeForValueENS_8DataTypeE+0xb0)[0x7fd8da4e1334]
/home/balint/projects/pandora/libpandora.so(_ZN3nix4hdf512PropertyHDF56valuesERKSt6vectorINS_5ValueESaIS3_EE+0x1de)[0x7fd8da4b0820]
./TestRunner(_ZN3nix8Property6valuesERKSt6vectorINS_5ValueESaIS2_EE+0x35)[0x4c2011]
./TestRunner(_ZN12TestProperty12testDataTypeEv+0x93c)[0x55f834]
./TestRunner(_ZN7CppUnit10TestCallerI12TestPropertyE7runTestEv+0x66)[0x4e3558]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZNK7CppUnit21TestCaseMethodFunctorclEv+0x22)[0x7fd8da7b22a2]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit16DefaultProtector7protectERKNS_7FunctorERKNS_16ProtectorContextE+0x20)[0x7fd8da7a8d90]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit14ProtectorChain7protectERKNS_7FunctorERKNS_16ProtectorContextE+0x2cd)[0x7fd8da7af7bd]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit10TestResult7protectERKNS_7FunctorEPNS_4TestERKSs+0x3a)[0x7fd8da7b7dfa]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit8TestCase3runEPNS_10TestResultE+0x10a)[0x7fd8da7b1faa]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit13TestComposite15doRunChildTestsEPNS_10TestResultE+0x43)[0x7fd8da7b25f3]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit13TestComposite3runEPNS_10TestResultE+0x1e)[0x7fd8da7b250e]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit13TestComposite15doRunChildTestsEPNS_10TestResultE+0x43)[0x7fd8da7b25f3]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit13TestComposite3runEPNS_10TestResultE+0x1e)[0x7fd8da7b250e]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit10TestResult7runTestEPNS_4TestE+0x22)[0x7fd8da7b7b32]
/usr/lib/x86_64-linux-gnu/libcppunit-1.13.so.0(_ZN7CppUnit10TestRunner3runERNS_10TestResultERKSs+0x2e)[0x7fd8da7b9e0e]
./TestRunner(main+0x8d8)[0x4c0908]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fd8d9326de5]
./TestRunner[0x4bfe59]
*** Error in `./TestRunner': corrupted double-linked list: 0x0000000000e46720 ***

The latter immediately hangs and thus has no dump.

balint42 commented 10 years ago

looking at the valgrind dump Im not much further understanding the source of the problem - can anyone reproduce this or is it my HDF5 version that causes problems?

valgrind --tool=memcheck --leak-check=yes --track-origins=yes ./TestRunner => https://gist.github.com/balint42/01b25765cc1b6b863b3a

balint42 commented 10 years ago

after pull request #182 by gicmo the following valgrind errors remain:

https://gist.github.com/balint42/d201edc85c3e82c27f8f

After lengthy investigation gicmo & I agree that this most likely is caused by HDF5 doing weird things with "bool". As gicmo pointed out could be resolved by using HBOOL or uint instead of bool.

gicmo commented 10 years ago

The bool part turned out to be a red herring. After a closer look again it turns out that uncertainty (double), as non-class type, was default-initialised only, i.e. no initialisation was performed (cf, C++12 8.5 §5 and 12.6.2 §8). Pull request #183 fixes this and therefore hopefully the Value part of the valgrind analysis.

gicmo@kaon G-Node/nix/build % valgrind --track-origins=yes ./TestRunner DataSet 
[...]
--25522-- ./TestRunner:
TestDataSet
TestDataSet::testNDSize : OK
TestDataSet::testChunkGuessing : OK
TestDataSet::testDataType : OK
TestDataSet::testBasic : OK
TestDataSet::testSelection : OK
TestDataSet::testValueIO : OK
TestDataSet::testNDArrayIO : OK
OK (7)
==25522==
==25522== HEAP SUMMARY:
[...]
==25522== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 7692 from 418)
gicmo commented 10 years ago

I have renamed the issue to better reflect that we are not only investigating memory leaks here.

balint42 commented 10 years ago

First of all thanks to gicmo for the lengthy investigation and to jgrewe for the nice catch with the "prev" iterator function!

I can confirm that gicmo's commits fix the "uninitialized value" part of the valgrind errors. I can also confirm that jgrewe's commits fix some of the "invealid read" part of the valgrind errors.

The following gist shows the current valgrind output on my machine, some invalid read errors remain: https://gist.github.com/balint42/648a39d139bd302e6772

Some remarks on the "std::prev" error: the C++11 draft in 24.2.6 only names the post decrement requirement for bi-directional iterators "is dereferenceable". It does not follow that this should mean problems if you decrement beyond "begin()", but it does not guarantee the opposite either. The implementation of "std::prev" only says that it uses "--"operator when available. It follows that "--" on STL containers does not seem to check for valid pointers and attempts read in any case. => "next" & "prev" are not mem safe to use without checks

gicmo commented 10 years ago

I have integrated valgrind into our CI process. It will show as "Dynamic Analysis" in CDash. See [1] for an example if it detects something. After a little (not too depth) investigation I believe that the last remaining warnings, that are all hdf5 related, are not coming from our code but are hdf5 internals, so I added them to a suppression file for valgrind. As of now valgrind from CI (with the suppression of the 2 HDF5 warnings) detects 0 defects (cf. [2]). Once the latter assumption is confirmed we can close this bug.

[1] http://0xdeadbabe.info/index.php?project=nix&date=2014-04-17 [2] http://0xdeadbabe.info/index.php?project=nix&date=2014-04-18

jgrewe commented 10 years ago

interestingly enough I do not have any of these warning, not on my laptop nor on my desktop istallations using libhdf5_cpp.so.7.0.3 ...

On 22.04.2014 10:35, Christian Kellner wrote:

I have integrated valgrind into our CI process. It will show as "Dynamic Analysis" in CDash. See [1] for an example if it detects something. After a little (not too depth) investigation I believe that the last remaining warnings, that are all hdf5 related, are not coming from our code but are hdf5 internals, so I added them to a suppression file for valgrind. As of now valgrind from CI (with the suppression of the 2 HDF5 warnings) detects 0 defects (cf. [2]). Once the latter assumption is confirmed we can close this bug.

[1] http://0xdeadbabe.info/index.php?project=nix&date=2014-04-17 [2] http://0xdeadbabe.info/index.php?project=nix&date=2014-04-18


Reply to this email directly or view it on GitHub: https://github.com/G-Node/pandora/issues/180#issuecomment-41015832

balint42 commented 10 years ago

@jgrewe: sorry for the dumb question but you are running valgrind with the following params?

valgrind --tool=memcheck --leak-check=yes --track-origins=yes

jgrewe commented 10 years ago

jupp, exactly that command

gicmo commented 10 years ago

I don't have them here on OSX either. One more hint that it is hdf5 internals not our code. HDF5 info:

hdf5: stable 1.8.12
http://www.hdfgroup.org/HDF5
/usr/local/Cellar/hdf5/1.8.9 (156 files, 11M)
/usr/local/Cellar/hdf5/1.8.12 (175 files, 11M) *
  Built from source with: --enable-cxx
From: https://github.com/homebrew/homebrew-science/commits/master/hdf5.rb
==> Dependencies
Required: szip ✔
==> Options
--c++11
    Build using C++11 mode
--enable-cxx
    Compile C++ bindings.
--enable-fortran
    Compile Fortran bindings
--enable-fortran2003
    Compile Fortran 2003 bindings. Requires enable-fortran.
--enable-parallel
    Compile parallel bindings
--enable-threadsafe
    Trade performance and C++ or Fortran support for thread safety
--universal
    Build a universal binary
jgrewe commented 10 years ago

As far as I remember, we concluded that the remaining warnings are hdf5 related and are not our responsibility. Thus, I close this issue.