Open makortel opened 1 year ago
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign core
New categories assigned: core
@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
FYI @pcanal (I'm not expecting any immediate action, but in case the stack traces would raise any eyebrows)
Nothing obvious :(
three Output modules running concurrently? Smells of thread unsafety.
memory model on ARM (and POWER) is not as strong as on X86_64 that essentially forgives most of the incorrect (unsafe) assumptions made in the code.
One can find many articles on internet: for instance https://www.arangodb.com/2021/02/cpp-memory-model-migrating-from-x86-to-arm/
Nothing that we are not aware of. Still clearly we can expect unsafe code to crash way more often on ARM than on x86_64 (where it may well work correctly under all possible conditions)
It would be useful to try to have an example that stress those parts of the code the may be unsafe and eventually able to crash it almost each time on ARM. At that point ThreadSanitizer may be used to pin point the critical issues.
Hi, looking at this from the ROOT side. How reproducible is this? Would it be possible to get a more accurate source location where it is crashing? I looked at the TBufferIO::TBufferIO
constructor and don't really see anything that could lead to a crash, it's just default initializing a number of fields...
Hi, looking at this from the ROOT side. How reproducible is this?
So far we have seen the crash once on ARM, so based on past experience the likelihood to reproduce is tiny.
Workflow 310.0 step 3 segfaulted in CMSSW_13_2_X_2023-06-22-2300 on el9_aarch64_gcc11 with
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el9_aarch64_gcc11/CMSSW_13_2_X_2023-06-22-2300/pyRelValMatrixLogs/run/310.0_Pyquen_GammaJet_pt20_2760GeV_2022/step3_Pyquen_GammaJet_pt20_2760GeV_2022.log#/