cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

[CLANG_X] Segmentation violation in ThePEG::EventGenerator::doinit #45872

Open iarspider opened 1 week ago

iarspider commented 1 week ago

RelVals 535.0, 537.0, 538.0 failed with SIGSEGV in ThePEG::EventGenerator::doinit:

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Wed Sep  4 07:50:25 CEST 2024
Thread 5 (Thread 0x14cb331ff700 (LWP 864038) "cmsRun"):
#0  0x000014cb6a747ac1 in poll () from /lib64/libc.so.6
#1  0x000014cb65f784bd in (anonymous namespace)::full_read(int, char*, unsigned long, int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014cb65f77f54 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x000014cb65f778bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x000014cb2f995a15 in (anonymous namespace)::recursionNotNull(ThePEG::Pointer::TransientConstRCPtr<ThePEG::PartonBin>, ThePEG::Pointer::TransientConstRCPtr<ThePEG::Particle>) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#6  0x000014cb2f9a9265 in ThePEG::LesHouchesReader::createPartonBinInstances() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#7  0x000014cb2f9a2c66 in ThePEG::LesHouchesReader::getXComb() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#8  0x000014cb2f9a2ec4 in ThePEG::LesHouchesReader::getSubProcess() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#9  0x000014cb2f9a4307 in ThePEG::LesHouchesReader::readEvent() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#10 0x000014cb2f99d754 in ThePEG::LesHouchesReader::scan() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#11 0x000014cb2f9a1e42 in ThePEG::LesHouchesReader::initialize(ThePEG::LesHouchesEventHandler&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#12 0x000014cb2f9cbd59 in ThePEG::LesHouchesFileReader::initialize(ThePEG::LesHouchesEventHandler&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#13 0x000014cb2f9d8466 in ThePEG::LesHouchesEventHandler::initialize() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/LesHouches.so.30
#14 0x000014cb2fbf63a5 in ThePEG::EventGenerator::doinit() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#15 0x000014cb2fbf9ba5 in ThePEG::EventGenerator::setup(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::set<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase>, std::less<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase> >, std::allocator<ThePEG::Pointer::RCPtr<ThePEG::InterfacedBase> > >&, std::map<long, ThePEG::Pointer::RCPtr<ThePEG::ParticleData>, std::less<long>, std::allocator<std::pair<long const, ThePEG::Pointer::RCPtr<ThePEG::ParticleData> > > >&, std::set<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase>, std::less<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase> >, std::allocator<ThePEG::Pointer::RCPtr<ThePEG::MatcherBase> > >&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#16 0x000014cb2fc3dd6a in ThePEG::Repository::makeRun(ThePEG::Pointer::TransientRCPtr<ThePEG::EventGenerator>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#17 0x000014cb2fc4057c in ThePEG::Repository::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#18 0x000014cb2fc40f6f in ThePEG::Repository::execAndCheckReply(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#19 0x000014cb2fc41279 in ThePEG::Repository::read(std::istream&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#20 0x000014cb2fc416dd in ThePEG::Repository::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::ostream&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libThePEG.so.30
#21 0x000014cb31305416 in (anonymous namespace)::HerwigGenericRead(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#22 0x000014cb318a94e8 in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#23 0x000014cb318a7a50 in Herwig7Interface::initRepository(edm::ParameterSet const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#24 0x000014cb318f8568 in Herwig7Hadronizer::initializeForExternalPartons() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#25 0x000014cb319068be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#26 0x000014cb6d2d5fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x000014cb6d2bf7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#28 0x000014cb6d194745 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#29 0x000014cb6d19460a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#30 0x000014cb6d194401 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#31 0x000014cb6d192fdb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#32 0x000014cb6d193a0f in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#33 0x000014cb6d1937c5 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#34 0x000014cb6cee13d9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::$_0>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#35 0x000014cb6b8b5b3b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x14cb68f0a200, waiter=..., this=0x14cb68fc9500) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#36 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x14cb68fc9500) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#37 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:137
#38 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:599
#39 0x000014cb6b8b7cee in tbb::detail::r1::rml::private_worker::run (this=0x14cb66874000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#40 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#41 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#42 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x14cb34198700 (LWP 864037) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8b7fd2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x14cb66874124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x14cb66874124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x14cb66874120) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#8  tbb::detail::r1::rml::private_worker::run (this=0x14cb66874100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#9  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#10 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#11 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x14cb34b99700 (LWP 864036) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8b7fd2 in tbb::detail::r1::futex_wait (comparand=2, futex=0x14cb668740a4) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x14cb668740a4) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::rml::internal::thread_monitor::wait (this=0x14cb668740a0) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:235
#8  tbb::detail::r1::rml::private_worker::run (this=0x14cb66874080) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:273
#9  tbb::detail::r1::rml::private_worker::thread_routine (arg=0x14cb66874080) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#10 0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#11 0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x14cb44347700 (LWP 864020) "cmsRun"):
#0  0x000014cb6a9fd6a2 in waitpid () from /lib64/libpthread.so.0
#1  0x000014cb65f781f1 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x000014cb6b087a73 in std::execute_native_thread_routine (__p=0x14cb45794680) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#3  0x000014cb6a9f31ca in start_thread () from /lib64/libpthread.so.0
#4  0x000014cb6a64e8d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x14cb69b6b680 (LWP 863948) "cmsRun"):
#0  0x000014cb6a71d098 in nanosleep () from /lib64/libc.so.6
#1  0x000014cb6a71cf9e in sleep () from /lib64/libc.so.6
#2  0x000014cb65f77544 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x000014cb6a64e41d in syscall () from /lib64/libc.so.6
#5  0x000014cb6b8bcca4 in tbb::detail::r1::futex_wait (comparand=2, futex=0x7ffebf455d90) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:100
#6  tbb::detail::r1::binary_semaphore::P (this=0x7ffebf455d90) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:253
#7  tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>::wait (this=0x7ffebf455d60) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:170
#8  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (this=<optimized out>, node=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:232
#9  tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::commit_wait (node=..., this=0x14cb68fcb598) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:228
#10 tbb::detail::r1::concurrent_monitor_base<tbb::detail::r1::market_context>::wait<tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&>(tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}&, tbb::detail::r1::sleep_node<tbb::detail::r1::market_context>&&) (node=..., pred=<synthetic pointer>..., this=0x14cb68fcb598) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/concurrent_monitor.h:262
#11 tbb::detail::r1::sleep_waiter::sleep<tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}>(unsigned long, tbb::detail::r1::external_waiter::pause(tbb::detail::r1::arena_slot&)::{lambda()#1}) (this=<optimized out>, wakeup_condition=..., uniq_tag=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:118
#12 tbb::detail::r1::external_waiter::pause (this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:144
#13 tbb::detail::r1::external_waiter::pause (this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/waiters.h:137
#14 tbb::detail::r1::task_dispatcher::receive_or_steal_task<true, tbb::detail::r1::external_waiter> (this=<optimized out>, tls=..., ed=..., waiter=..., isolation=<optimized out>, fifo_allowed=<optimized out>, critical_allowed=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:231
#15 0x000014cb6b8be4e2 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x14cb68fc9380) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:350
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x14cb68fc9380) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#17 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#18 0x000014cb6d17f668 in void tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x000014cb6d17d9c5 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x000014cb6d15baf0 in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x000014cb6d158deb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#22 0x0000563b49ef0b6e in tbb::detail::d1::task_arena_function<main::$_0::operator()() const::{lambda()#1}, void>::operator()() const ()
#23 0x000014cb6b8aa9ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#24 0x0000563b49eefd06 in main::$_0::operator()() const ()
#25 0x0000563b49eed7ff in main ()

Current Modules:

Module: Herwig7HadronizerFilter:generator (crashed)
Module: none
Module: none
Module: none
iarspider commented 1 week ago

assign generators

cmsbuild commented 1 week ago

New categories assigned: generators

@bbilin,@mkirsano,@menglu21,@lviliani you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 week ago

cms-bot internal usage

cmsbuild commented 1 week ago

A new Issue was created by @iarspider.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

lviliani commented 1 week ago

@Dominic-Stafford @theofil can you please take a look? It seems related to Herwig. Thanks!

Dominic-Stafford commented 1 week ago

I'll try to have a look next week (unless @theofil has time before then). @iarspider has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

iarspider commented 1 week ago

has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

I don't think so.

theofil commented 1 week ago

Hi, I'll will also try to look at it next week starting on Tuesday.

a) Could we get some instructions how to reproduce the problem in an lxplus session ? b) was the gridpack changed, after the last version of the working RelVals that have been produced w/o problems ?

best, K.

makortel commented 1 week ago

The recipe to reproduce would be along

cmssw-el8
cmsrel CMSSW_14_2_CLANG_X_2024-09-05-2300
cd CMSSW_14_2_CLANG_X_2024-09-05-2300/src
cmsenv
runTheMatrix.py -l 535.0 -t 4
makortel commented 1 week ago

has the CLANG/C++ version or anything else like this changed which might have triggered the issue?

I don't think so.

In case there is a connection to https://github.com/cms-sw/cmssw/issues/45510 (as wondered in https://github.com/cms-sw/cmssw/issues/45510#issuecomment-2333976429), we updated the C++ standard from 17 to 20 around Jul 12 (https://github.com/cms-sw/cmsdist/pull/9288), and https://github.com/cms-sw/cmssw/issues/45510 was opened on Jul 19. But I guess we have lost the information on whether the problem reported in https://github.com/cms-sw/cmssw/issues/45510 started on 07-18-2300 IB or earlier (or on the first occurrence of the problem reported in this issue).

theofil commented 1 week ago

@makortel thanks for the code.

indeed the test is passed for CMSSW_14_2_X_2024-09-06-1100 but fails for CMSSW_142CLANG_X_2024-09-05-2300 but both IBs have clang version 18.1.6, at least this is what I get from clang --version while on singularity

ideas ?

makortel commented 1 week ago

The only difference between the default IB and the CLANG IB is that in the default IB the CMSSW code is compiled with gcc, and in CLANG IB with clang. The externals should be compiled with gcc in both cases (and be the same binaries).

Some thoughts

Dominic-Stafford commented 6 days ago

I've had a bit of a look at this, and found very weirdly that I can reproduce it locally when I do runTheMatrix.py -el 535, but then when I go into the run directory and rerun the cfg with cmsRun (for instance to try valgrind or gdb), I get a different error:

09-Sep-2024 15:10:42 CEST  Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Successfully opened LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST  Successfully opened LHE file thread0/cmsgrid_final.lhe
%MSG-w LogicError:  LheWeightValidation:lheWeightValidation@beginRun  09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.

%MSG
%MSG-w LogicError:  Herwig7HadronizerFilter:generator@beginRun  09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.

%MSG
* A warning exception occurred in the initialization of EventGenerator: 
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: The sum of the cross sections of the readers in the LesHouchesEventHandler 'LesHouchesHandler' was zero.
Error: The object '/Herwig/Partons/PDFSet_nnlo' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/PDFSet_lo' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesHandler' was not created as another object with that name already exists.
Error: The object '/Herwig/Cuts/NoCuts' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/LHAPDF' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesReader' was not created as another object with that name already exists.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
* A warning exception occurred in the initialization of EventGenerator: 
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator: 
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.

Use the following handler setting instead:
  set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator: 
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: the optional weights names for the LesHouchesEventHandler do not match 'LesHouchesHandler'
Herwig: EventGenerator not available.
Check if 'InterfaceMatchboxTest.run' is a valid run file.

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Mo 9. Sep 15:10:44 CEST 2024
Thread 2 (Thread 0x7f9450f75700 (LWP 1778195) "cmsRun"):
#0  0x00007f94784856a2 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f9472d981f1 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00007f9478b0fa73 in std::execute_native_thread_routine (__p=0x7f945290f7a0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#3  0x00007f947847b1ca in start_thread () from /lib64/libpthread.so.0
#4  0x00007f94780d68d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f94775f3680 (LWP 1776592) "cmsRun"):
#0  0x00007f94781cfac1 in poll () from /lib64/libc.so.6
#1  0x00007f9472d984bd in (anonymous namespace)::full_read(int, char*, unsigned long, int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x00007f9472d97f54 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x00007f9472d978bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f9440a6a099 in (anonymous namespace)::HerwigGenericRun(Herwig::HerwigUI const&, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#6  0x00007f9440a6be06 in Herwig::API::prepareRun(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#7  0x00007f9441016533 in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#8  0x00007f944101682f in Herwig7Interface::initGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#9  0x00007f9441065577 in Herwig7Hadronizer::initializeForExternalPartons() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#10 0x00007f94410738be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#11 0x00007f947ad72fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x00007f947ad5c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x00007f947ac31745 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007f947ac3160a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007f947ac31401 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007f947ac2ffdb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x00007f947ac30a0f in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#18 0x00007f947ac307c5 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f947a97e3d9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::$_0>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#20 0x00007f947935b3e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#21 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#22 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#23 0x00007f947ac1c668 in void tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x00007f947ac1a9c5 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x00007f947abf8af0 in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x00007f947abf5deb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x000055ecea2a3b6e in tbb::detail::d1::task_arena_function<main::$_0::operator()() const::{lambda()#1}, void>::operator()() const ()
#28 0x00007f94793479ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#29 0x000055ecea2a2d06 in main::$_0::operator()() const ()
#30 0x000055ecea2a07ff in main ()

Current Modules:

Module: Herwig7HadronizerFilter:generator (crashed)

A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)

That seems to be that Herwig has crashed due to some issue reading the LHE file (though I'm not exactly sure why as the lhe file is the same as the one that ran without issues in a non-CLANG build), and then we keep going despite not having produced the run file, causing a seg fault). For this new issue I'd definitely lay the blame at the fact we skip over any errors from Herwig here -it may or may not be the root cause of the full issue, but it certainly makes it harder to debug. We've kept this block for a long time as it's supposed to get around an issue with Herwig being called before the externalLHEProducer, but I think we should really get rid of it, as running past errors where Herwig would just exit could be the root of what we're seeing here, then deal with the issue with the sequence of calls if it's still occuring. I probably won't have time to try this in the next few days, so if you have time @theofil that would be good, otherwise I'll try to by the end of the week.

smuzaffar commented 4 days ago

I have checked our opensearch and found that workflow 535 ran successfully for CMSSW_14_1_CLANG_X_2024-07-11-2300 IB. The first failure was in CMSSW_14_1_CLANG_X_2024-07-12-2300 but the error code was 256 and many other workflows also failed with exit code 256 that day. The first day workflow 535 failed with this segmentation error (exit code 62720) was CMSSW_14_1_CLANG_X_2024-07-16-2300. cmssw changes between 2024-07-11-2300 to 2024-07-12-2300 should be while cmssw changes between 2024-07-12-2300 to 2024-07-16-2300

theofil commented 3 days ago

I haven't yet found the origin of the problem, but I can reply to this question:

  • Does the problem reproduce with one thread?

yes

@smuzaffar thanks a lot for the info. I see that the

https://github.com/cms-sw/cmssw/commit/91c2ca346214fb094120091a978c16b3698612e3

could relevant to the crash we see. I will try to have a look if this is really where the problem starts. Would replacing the relval_2017.pyof the IB that is crashing, with the reval_2017.py from the working IB be a sensible check or there could be other things breaking behind ?

Apart from the software changes we see in the

https://github.com/cms-sw/cmssw/compare/e232a9b23f2f9125c636fbd3b70f3920f1ca6970...c103a34c29b52f4d483433096101ac2e2403da4d

are there other differences between the 2 IBs for what concerns their builds ?

smuzaffar commented 3 days ago

As @makortel mentioned we also have updated c++ standard (to c++20) for July 12th IB.

By the way, build herwig7 and sherpa in debug mode, I get this stacktrace for workflow 535/step1

#3  0x00007fa46a6138bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007fa436425a15 in std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl_data::_Vector_impl_data (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:100
#6  std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl::_Vector_impl (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:139
#7  std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_base (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:312
#8  std::vector<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::vector (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:526
#9  ThePEG::Particle::parents (this=<optimized out>) at ../include/ThePEG/EventRecord/Particle.h:159
#10 (anonymous namespace)::recursionNotNull (bin=..., p=...) at LesHouchesReader.cc:719
#11 0x00007fa436439265 in ThePEG::LesHouchesReader::createPartonBinInstances (this=0x7fa433865000) at LesHouchesReader.cc:731
#12 0x00007fa436432c66 in ThePEG::LesHouchesReader::getXComb (this=0x7fa433865000) at LesHouchesReader.cc:443
#13 0x00007fa436432ec4 in ThePEG::LesHouchesReader::getSubProcess (this=0x7fa433865000) at LesHouchesReader.cc:458
#14 0x00007fa436434307 in ThePEG::LesHouchesReader::readEvent (this=0x7fa433865000) at LesHouchesReader.cc:576
#15 0x00007fa43642d754 in ThePEG::LesHouchesReader::scan (this=0x7fa433865000) at LesHouchesReader.cc:305
#16 0x00007fa436431e42 in ThePEG::LesHouchesReader::initialize (this=<optimized out>, eh=...) at LesHouchesReader.cc:272
#17 0x00007fa43645bd59 in ThePEG::LesHouchesFileReader::initialize (this=0x7fa433865000, eh=...) at LesHouchesFileReader.cc:462
#18 0x00007fa436468466 in ThePEG::LesHouchesEventHandler::initialize (this=0x7fa40a74c400) at LesHouchesEventHandler.cc:87
#19 0x00007fa436686375 in ThePEG::EventGenerator::doinit (this=0x7fa44bc0ac00) at EventGenerator.cc:262
#20 0x00007fa436689b75 in ThePEG::InterfacedBase::init (this=0x7fa44bc0ac00) at ../include/ThePEG/Interface/InterfacedBase.h:246
#21 ThePEG::EventGenerator::setup (this=this@entry=0x7fa44bc0ac00, newRunName=..., newObjects=..., newParticles=..., newMatchers=...) at EventGenerator.cc:175
#22 0x00007fa4366cdd3a in ThePEG::Repository::makeRun (eg=..., name=...) at Repository.cc:316
#23 0x00007fa4366d054c in ThePEG::Repository::exec (command=..., os=...) at Repository.cc:786
#24 0x00007fa4366d0f3f in ThePEG::Repository::execAndCheckReply (line=..., os=...) at Repository.cc:510
#25 0x00007fa4366d1249 in ThePEG::Repository::read (is=..., os=..., prompt=...) at Repository.cc:566
#26 0x00007fa4366d16ad in ThePEG::Repository::read (filename=..., os=...) at Repository.cc:452
#27 0x00007fa437d953b9 in (anonymous namespace)::HerwigGenericRead (ui=...) at HerwigAPI.cc:146
#28 0x00007fa43833f4e8 in Herwig7Interface::callHerwigGenerator (this=this@entry=0x7fa43feb1190) at src/GeneratorInterface/Herwig7Interface/src/Herwig7Interface.cc:149
#29 0x00007fa43833da50 in Herwig7Interface::initRepository (this=0x7fa43feb1190, pset=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/FWCore/MessageLogger/interface/MessageLogger.h:78
#30 0x00007fa43838e568 in Herwig7Hadronizer::initializeForExternalPartons (this=this@entry=0x7fa43feb10a0) at src/GeneratorInterface/Herwig7Interface/plugins/Herwig7Hadronizer.cc:109
#31 0x00007fa43839c8be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce (this=0x7fa43feb1000, lumi=..., es=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/GeneratorInterface/Core/interface/HadronizerFilter.h:367
#32 0x00007fa473842fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#33 0x00007fa47382c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
smuzaffar commented 3 days ago

Note that all these failing workflows (511, 535, 537, 538 and 539) in CLANG IBs are herwig7

smuzaffar commented 3 days ago

One could also check if the problem reproduces with cmsRunGlibC or cmsRunTC (that use other allocators, if this is a memory problem they may behave differently, or even give some diagnostics)

failed for both cmsRunGlibC or cmsRunTC. Also failed in single thread mode

theofil commented 2 days ago

I had very little progress so far unfortunately.

I compiled two versions of Herwig under

CMSSW_14_2_X_2024-09-06-1100 CMSSW_14_2_CLANG_X_2024-09-05-2300

and run standalone Herwig MC generation, checking if we can generate simple processes without reading external LHE files. We cannot generate any event in CMSSW_14_2_CLANG_X_2024-09-05-2300 we get immediately a segmentation fault while attempting to make the 1st event, but everything is OK in CMSSW_14_2_X_2024-09-06-1100 and MC generation finishes normally. This confirms what earlier Dominic said, despite that in the first error messages we see complains about reading LHE files, this has nothing to do with the crash we have later on. (Actually we get these messages even when things work.)

While compiling the code in the two releases, I see many warnings regarding to the ThePEG regarding arithmetic operations that I was not used to see before, but that all seem innocent in the CMSSW_14_2_X_2024-09-06-1100 warnings case.

However in the CMSSW_14_2_CLANG_X_2024-09-05-2300.txt warnings we see for fist time warning regarding the creation of the RCPtr pointer in particular the /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/external/thepeg/2.2.2-330d679d0765729c295842b54c3a747c/include/ThePEG/Pointer/RCPtr.h:152:15: note: in implicit copy constructor for 'ThePEG::EventInfoBase' first required here 152 | ptr = new T(t);

which later appears in the crash messages.

This to me confirms that the problem is not related with the CMSSW HerwigInterface and there is not much we can do there, but rather with the external package ThePEG, which is needed by Herwig generator.

Is there a reason why we build CMSSW with clang while the external packages, like ThePEG are still built with gcc ? Is it sound to use the same binary of the ThePEG in the two cases ? Is it possible to try to have the ThePEG built also with clang instead of gcc when CMSSW is built with clang ?