BxCppDev / Bayeux

Core Persistency, Geometry and Data Processing C++ Library for Particle and Nuclear Physics Experiments
GNU General Public License v3.0
4 stars 9 forks source link

`datatools_test_handle_1` failing on macOS in optimised builds #20

Closed drbenmorgan closed 5 years ago

drbenmorgan commented 6 years ago

Builtin Bayeux 3.1.2 tag, and current develop branch on macOS is working bar an odd failure of the datatools-test_handle_1 test is failing in all bar Debug builds. Stripping the build back to just datatools and tests, with Boost 1.63, C++11, running the test in aDebug build succeeds without error. In RelWithDebInfo or Release builds, runs result in:

$ ./BuildProducts/bin/bxtests/datatools-test_handle_1
Test of the 'handle<>' template class...

Test 1: 
Hits : 
hit @ 0x7fcb7d417aa0 : {id="0" , tdc=123}
hit @ 0x7fcb7d413ae0 : {id="1" , tdc=456}
hit @ 0x7fcb7d417b20 : {id="2" , tdc=789}
hit @ 0x7fcb7d41a310 : {id="100" , tdc=10000}
hit @ 0x7fcb7d41a340 : {id="4" , tdc=654}
hit @ 0x7fcb7d41a280 : {id="5" , tdc=987}
hit @ 0x7fcb7d41a450 : {id="20" , tdc=20000}
Hits2 : 
hit @ 0x7fcb7d413ae0 : {id="1" , tdc=456}
hit @ 0x7fcb7d41a310 : {id="100" , tdc=10000}
hit @ 0x7fcb7d41a340 : {id="4" , tdc=654}
Erase +0 : 
Erase +2 : 
Erase +3 : 
Add : 
Hits(bis) : 
hit @ 0x7fcb7d413ae0 : {id="1" , tdc=456}
hit @ 0x7fcb7d417b20 : {id="2" , tdc=789}
hit @ 0x7fcb7d41a340 : {id="4" , tdc=654}
hit @ 0x7fcb7d41a450 : {id="20" , tdc=20000}
hit @ 0x7fcb7d413ae0 : {id="1" , tdc=456}

Serialize...
Segmentation fault: 11

Running in `lldb' yields:

Process 36390 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000e82d datatools-test_handle_1`void boost::archive::basic_text_oprimitive<std::__1::basic_ostream<char, std::__1::char_traits<char> > >::save_impl<int>(this=<unavailable>, t=0x0000000000000000, (null)=0x00007ffeefbfdee8) at basic_text_oprimitive.hpp:130 [opt]
   127              boost::serialization::throw_exception(
   128                  archive_exception(archive_exception::output_stream_error)
   129              );
-> 130          os << t;
   131      }
   132  
   133      /////////////////////////////////////////////////////////
Target 0: (datatools-test_handle_1) stopped.

and a backtrace yields a whole chain of output that is in the attached file: datatools_test_handle_1.bt.txt

My guess would be that something related to hit::serialize is getting optimised out, but it's not obvious why give that the very similar test datatools_test_handle_2 doesn't have any of these issues. Linux builds with gcc5 do not show issues, so I guess this is related to the clang compiler (should be reproducible on Linux/clang).

Not a critical issue, but wanted to report as I'm preparing a PR to add some functionality to datatools::handle. As this will only add to the user interface and has no affect on serialisation, it shouldn't affect or be affected the failure of the above test.

fmauger commented 6 years ago

Very strange issue! I have inspected the code and I cannot figure what is the problem. Guillaume recently faced a problem with unsupported serialization of uninitialized boolean class attributes but we are not in this case AFAICS.

drbenmorgan commented 6 years ago

@fmauger I've now had some time to look at your checks, with result:

Do you have at least some partial output of the "test_handle_1.txt" file or nothing at all ? Maybe we could find a hint of some mess with the serialization of the underlying shared_ptr. The "memory tracking" algorithm could be broken by some memory optimization...

The file is created, but is zero length

What happens if you serialize only the hits collection ?

Trying to (de)serialise only the hits2 object results in success. Trying to (de)serialize only the hits object fails as before.

if you comment lines 195 to 208 ?

No change, still segfaults at "Serialize..." step

What happens if you serialize only through the xml archive ?

No change, still segfaults at the "Serialize..." step.

So looks like the hits instance is where things are going wrong somehow.

fmauger commented 6 years ago

ok. I will try to think about it.

drbenmorgan commented 5 years ago

Seemingly resolved in Boost 1.68/69, so fixed as of 681b7f31a06d527b78321e2fd79f0b320603cd78 in develop, which supports 1.69.