Closed nsjarvis closed 1 year ago
I can confirm your observation. I do not recommend running with 8 threads on jlabl5.
---- edit: found a mistake on my part
Did it run correctly single-threaded? The idea seems familiar but I cannot remember for certain, it was so long ago.
Valgrind does not tell me much more:
==667513== 1 errors in context 4 of 127:
==667513== Invalid read of size 4
==667513== at 0x1104721: DEventSourceHDDM::Extract_DMCReaction(hddm_s::HDDM*, jana::JFactory<DMCReaction>*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, jana::JEventLoop*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x1105959: DEventSourceHDDM::GetObjects(jana::JEvent&, jana::JFactory_base*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x8024BA: jerror_t jana::JEvent::GetObjects<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, jana::JFactory_base*) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x8028B8: jana::JFactory<DMCReaction>* jana::JEventLoop::GetFromFactory<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, char const*, jana::JEventLoop::data_source_t&, bool) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x802A88: jana::JFactory<DMCReaction>* jana::JEventLoop::Get<DMCReaction>(std::vector<DMCReaction const*, std::allocator<DMCReaction const*> >&, char const*, bool) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x1258702: DEventWriterREST::Write_RESTEvent(jana::JEventLoop*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/bin/hd_root)
==667513== by 0x27E36356: JEventProcessor_danarest::evnt(jana::JEventLoop*, unsigned long) (in /home/aaustreg/Software/gluex_top/halld_recon/halld_recon-4.39.0/Linux_RHEL8-x86_64-gcc8.5.0/plugins/danarest.so)
==667513== by 0x128C70E: jana::JEventLoop::OneEvent() (JEventLoop.cc:693)
==667513== by 0x128CD23: jana::JEventLoop::Loop() (JEventLoop.cc:496)
==667513== by 0x1269A48: LaunchThread(void*) (JApplication.cc:1382)
==667513== by 0xD1821C9: start_thread (in /usr/lib64/libpthread-2.28.so)
==667513== by 0xDF06E72: clone (in /usr/lib64/libc-2.28.so)
==667513== Address 0x1c is not stack'd, malloc'd or (recently) free'd
This is a hint:
/include/HDDM/hddm_s.hpp:9251
This looks for the properties of the target
inline Properties &Target::getProperties() {
return m_properties_link.front();
}
but this field is not present in the input
<properties charge="int" mass="float"/>
And indeed, skipping over this line prevents the crash: https://github.com/JeffersonLab/halld_recon/blob/master/src/libraries/HDDM/DEventSourceHDDM.cc#L1159
Apparently, calling std::vector::front on an empty container causes undefined behavior. Since this is used everywhere in hddm-cpp, I want to hand this problem off to @rjones30
Alec, yes calling vector::front on an empty container causes unallocated memory access, which is a bug. Looking at it now. -Richard Jones
On Wed, Dec 14, 2022 at 2:40 PM Alex Austregesilo @.***> wrote:
Apparently, calling std::vector::front on an empty container causes undefined behavior. Since this is used everywhere in hddm-cpp, I want to hand this problem off to @rjones30 https://github.com/rjones30
— Reply to this email directly, view it on GitHub https://github.com/JeffersonLab/halld_recon/issues/635#issuecomment-1352049959, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3YKWHJGLE33LZEZP6VHS3WNIPCRANCNFSM6AAAAAAQBUALGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hello Alex and all,
The properties tag under "target" is not optional in hddm_s. You don't have to specify a <target> but if you do then you must give it <properties>. This is clear from the hddm template, as shown below. If the tag is optional then it has a minoccurs="0" attribute to indicate you can leave it out without breaking the model. Please mark this issue as resolved. I will check to see if it needs to be opened in hdgeant or hdgeant4 to make sure the outputs are compliant. -Richard Jones
<target minOccurs="0" type="Particle_t">
<momentum E="float" px="float" py="float" pz="float"/>
<polarization minOccurs="0" Px="float" Py="float" Pz="float"/>
<properties charge="int" mass="float"/>
</target>
Ok, that makes sense. I can also confirm that the generator gen_amp and its derivatives fill the
@nsjarvis Where did the input file (etapipi-100.hddm) come from? Can you please make sure it is recent?
Nevertheless, we can make our lives easier by making the reader more robust against inadvertent omissions that make little difference to the analysis like this. I have posted a PR named fix_hddm_read_crashes_rtj that I think should solve this issue. Please test and if it works, approve. -Richard
Thank you for fixing this.
PS I checked with a real data file (hd_rawdata_073070_001.evio) and it runs over that no problem.
The files necessary to reproduce this are in /work/halld/njarvis/bug_danarest - use the script run.sh in that directory It works on ifarm (rhel7) but not on jlabl5 (rhel8)
The screen output from jlabl5 is in /work/halld/njarvis/bug_danarest/os8.out.
Key excerpts from the error messages are