cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

TestFWCoreFrameworkTransitions test failure #46569

Open smuzaffar opened 3 weeks ago

smuzaffar commented 3 weeks ago

Unit tests TestFWCoreFrameworkTransitions crashed for 14.1.X ( see log https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_X_2024-10-31-1100/unitTestLogs/FWCore/Framework#/20749-20749 ) . In last 8 weeks this test failed randomly for 4 different IBs [b]. @cms-sw/core-l2 , is there any known issue with this test?

[a]

Test: Empty file at end
****************************************
%MSG-i ThreadStreamSetup:  (NoModuleName) 31-Oct-2024 13:09:56 CET pre-events
setting # threads 2
setting # streams 2
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 31-Oct-2024 13:09:57.781 CET
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 31-Oct-2024 13:09:57.781 CET

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Thu Oct 31 13:09:57 CET 2024
Thread 3 (Thread 0x155102903700 (LWP 1628237) "cmsRun"):
#0  0x00001551273baac1 in poll () from /lib64/libc.so.6
#1  0x0000155122c01857 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x0000155122c01a54 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  tbb::detail::r1::spawn (t=..., ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:36
#5  0x0000155129f2e608 in tbb::detail::d1::function_task<edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so
#6  0x000015512938bb3b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x155125dc7500, waiter=..., this=0x155125dbbe80) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#7  tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x155125dbbe80) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#8  tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:137
#9  tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:599
#10 0x000015512938dcee in tbb::detail::r1::rml::private_worker::run (this=0x155122fb4100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#11 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x155122fb4100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#12 0x00001551276661ca in start_thread () from /lib64/libpthread.so.0
#13 0x00001551272c18d3 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x155102ba9700 (LWP 1628231) "cmsRun"):
#0  0x00001551276706a2 in waitpid () from /lib64/libpthread.so.0
#1  0x0000155122bfe327 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2  0x0000155122c0163a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  0x0000155127cd8a73 in std::execute_native_thread_routine (__p=0x1551055fc450) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#4  0x00001551276661ca in start_thread () from /lib64/libpthread.so.0
#5  0x00001551272c18d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x1551292d0680 (LWP 1628084) "cmsRun"):
#0  0x0000155127390098 in nanosleep () from /lib64/libc.so.6
#1  0x000015512738ff9e in sleep () from /lib64/libc.so.6
#2  0x0000155122bfe1d0 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00001551272c141d in syscall () from /lib64/libc.so.6
#5  0x000015512938d46b in tbb::detail::r1::futex_wakeup_one (futex=0x155122fb4124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:109
#6  tbb::detail::r1::binary_semaphore::V (this=0x155122fb4124) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/semaphore.h:262
#7  tbb::detail::r1::rml::internal::thread_monitor::notify (this=0x155122fb4120) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/rml_thread_monitor.h:230
#8  tbb::detail::r1::rml::private_worker::wake_or_launch (this=0x155122fb4100) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:292
#9  tbb::detail::r1::rml::private_server::wake_some (this=<optimized out>, additional_slack=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:412
#10 0x000015512938dc67 in tbb::detail::r1::rml::private_server::adjust_job_count_estimate (delta=1, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:423
#11 tbb::detail::r1::market::adjust_demand (this=0x155125dcb580, a=..., delta=1, mandatory=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:588
#12 0x0000155129391ba6 in tbb::detail::r1::arena::advertise_new_work<(tbb::detail::r1::arena::new_work_type)0> (this=0x155125dbba00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.h:547
#13 0x0000155129f320b4 in edm::WaitingTaskHolder::doneWaiting(std::__exception_ptr::exception_ptr) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x0000155129f4f549 in edm::EventProcessor::endUnfinishedLumi(bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x0000155129f52412 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02861/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_X_2024-10-27-0000/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x000000000040840c in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#17 0x00001551293809ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#18 0x000000000040a0f2 in main::{lambda()#1}::operator()() const ()
#19 0x0000000000405100 in main ()

Current Modules:

Module: none (crashed)
Module: none

A fatal system signal has occurred: segmentation violation
/data/cmsbld/jenkins/workspace/ib-run-qa/CMSSW_14_1_X_2024-10-31-1100/src/FWCore/Framework/test/transition_test.sh: line 16: 1628084 Segmentation fault      (core dumped) ( cmsRun ${LOCAL_TEST_DIR}/transition_test_cfg.py 9 )
Failure running cmsRun transition_test_cfg.py 9: status 139

[b]

CMSSW_14_1_X_2024-10-31-1100 el8_amd64_gcc12
CMSSW_14_2_ROOT632_X_2024-10-22-2300 el8_amd64_gcc12
CMSSW_14_1_X_2024-10-18-2300 el8_amd64_gcc12
CMSSW_14_2_NONLTO_X_2024-10-16-1100 el8_amd64_gcc12
smuzaffar commented 3 weeks ago

assign core

cmsbuild commented 3 weeks ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 3 weeks ago

cms-bot internal usage

cmsbuild commented 3 weeks ago

A new Issue was created by @smuzaffar.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 3 weeks ago

Symptoms look similar to https://github.com/cms-sw/cmssw/issues/42093

May also be related to https://github.com/cms-sw/cmssw/issues/45487