Open iarspider opened 1 week ago
assign generators
New categories assigned: generators
@bbilin,@mkirsano,@menglu21,@lviliani you have been requested to review this Pull request/Issue and eventually sign? Thanks
cms-bot internal usage
A new Issue was created by @iarspider.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
@Dominic-Stafford @theofil can you please take a look? It seems related to Herwig. Thanks!
I'll try to have a look next week (unless @theofil has time before then). @iarspider has the CLANG/C++ version or anything else like this changed which might have triggered the issue?
has the CLANG/C++ version or anything else like this changed which might have triggered the issue?
I don't think so.
Hi, I'll will also try to look at it next week starting on Tuesday.
a) Could we get some instructions how to reproduce the problem in an lxplus session ? b) was the gridpack changed, after the last version of the working RelVals that have been produced w/o problems ?
best, K.
The recipe to reproduce would be along
cmssw-el8
cmsrel CMSSW_14_2_CLANG_X_2024-09-05-2300
cd CMSSW_14_2_CLANG_X_2024-09-05-2300/src
cmsenv
runTheMatrix.py -l 535.0 -t 4
has the CLANG/C++ version or anything else like this changed which might have triggered the issue?
I don't think so.
In case there is a connection to https://github.com/cms-sw/cmssw/issues/45510 (as wondered in https://github.com/cms-sw/cmssw/issues/45510#issuecomment-2333976429), we updated the C++ standard from 17 to 20 around Jul 12 (https://github.com/cms-sw/cmsdist/pull/9288), and https://github.com/cms-sw/cmssw/issues/45510 was opened on Jul 19. But I guess we have lost the information on whether the problem reported in https://github.com/cms-sw/cmssw/issues/45510 started on 07-18-2300 IB or earlier (or on the first occurrence of the problem reported in this issue).
@makortel thanks for the code.
indeed the test is passed for CMSSW_14_2_X_2024-09-06-1100 but fails for CMSSW_142CLANG_X_2024-09-05-2300 but both IBs have clang version 18.1.6, at least this is what I get from clang --version
while on singularity
ideas ?
The only difference between the default IB and the CLANG IB is that in the default IB the CMSSW code is compiled with gcc
, and in CLANG IB with clang
. The externals should be compiled with gcc
in both cases (and be the same binaries).
Some thoughts
ThePEG
exactly the same between default and CLANG IBs? (the stack hints towards reading LHE file, so probably the input is the same?)gdb
, but in order to be useful it might need ThePEG
to be built with debug symbolsvalgrind
might reveal something useful (but will be slow)
cmsRunGlibC
or cmsRunTC
(that use other allocators, if this is a memory problem they may behave differently, or even give some diagnostics)I've had a bit of a look at this, and found very weirdly that I can reproduce it locally when I do runTheMatrix.py -el 535
, but then when I go into the run directory and rerun the cfg with cmsRun (for instance to try valgrind or gdb), I get a different error:
09-Sep-2024 15:10:42 CEST Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST Successfully opened LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST Initiating request to open LHE file thread0/cmsgrid_final.lhe
09-Sep-2024 15:10:42 CEST Successfully opened LHE file thread0/cmsgrid_final.lhe
%MSG-w LogicError: LheWeightValidation:lheWeightValidation@beginRun 09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.
%MSG
%MSG-w LogicError: Herwig7HadronizerFilter:generator@beginRun 09-Sep-2024 15:10:42 CEST Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'externalLHEProducer'.
The specified productInstanceName was ''.
%MSG
* A warning exception occurred in the initialization of EventGenerator:
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator:
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator:
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.
Use the following handler setting instead:
set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator:
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: The sum of the cross sections of the readers in the LesHouchesEventHandler 'LesHouchesHandler' was zero.
Error: The object '/Herwig/Partons/PDFSet_nnlo' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/PDFSet_lo' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesHandler' was not created as another object with that name already exists.
Error: The object '/Herwig/Cuts/NoCuts' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/LHAPDF' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesReader' was not created as another object with that name already exists.
* A warning exception occurred in the initialization of EventGenerator:
No information about the energy of incoming particles were found in LesHouchesReader 'LesHouchesReader'.
* A warning exception occurred in the initialization of EventGenerator:
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator:
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.
Use the following handler setting instead:
set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator:
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
* A warning exception occurred in the initialization of EventGenerator:
No information about the weighting scheme was found. The events produced by LesHouchesReader LesHouchesReader may not be sampled correctly.
* A warning exception occurred in the initialization of EventGenerator:
LesHouchesReader LesHouchesReader has the IDWTUP flag set to 0, which does not correspond
to the weight option -2 set in the LesHouchesEventHandler LesHouchesHandler.
Use the following handler setting instead:
set LesHouchesHandler:WeightOption 0
Will try to make intelligent guesses to get correct statistics. In most cases this should be sufficient. Unset <interface>WeightWarnings</interface> to avoid this message
* A warning exception occurred in the initialization of EventGenerator:
The file associated with 'LesHouchesReader' does not contain a proper formatted Les Houches event file. The events may not be properly sampled.
Error: the optional weights names for the LesHouchesEventHandler do not match 'LesHouchesHandler'
Herwig: EventGenerator not available.
Check if 'InterfaceMatchboxTest.run' is a valid run file.
A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
Mo 9. Sep 15:10:44 CEST 2024
Thread 2 (Thread 0x7f9450f75700 (LWP 1778195) "cmsRun"):
#0 0x00007f94784856a2 in waitpid () from /lib64/libpthread.so.0
#1 0x00007f9472d981f1 in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00007f9478b0fa73 in std::execute_native_thread_routine (__p=0x7f945290f7a0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#3 0x00007f947847b1ca in start_thread () from /lib64/libpthread.so.0
#4 0x00007f94780d68d3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f94775f3680 (LWP 1776592) "cmsRun"):
#0 0x00007f94781cfac1 in poll () from /lib64/libc.so.6
#1 0x00007f9472d984bd in (anonymous namespace)::full_read(int, char*, unsigned long, int) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00007f9472d97f54 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 0x00007f9472d978bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 0x00007f9440a6a099 in (anonymous namespace)::HerwigGenericRun(Herwig::HerwigUI const&, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#6 0x00007f9440a6be06 in Herwig::API::prepareRun(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-05-2300/external/el8_amd64_gcc12/lib/libHerwigAPI.so.2
#7 0x00007f9441016533 in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#8 0x00007f944101682f in Herwig7Interface::initGenerator() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libGeneratorInterfaceHerwig7Interface.so
#9 0x00007f9441065577 in Herwig7Hadronizer::initializeForExternalPartons() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#10 0x00007f94410738be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#11 0x00007f947ad72fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x00007f947ad5c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x00007f947ac31745 in edm::workerhelper::CallImpl<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::call(edm::Worker*, edm::StreamID, edm::LumiTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::GlobalContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007f947ac3160a in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007f947ac31401 in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007f947ac2ffdb in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x00007f947ac30a0f in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}::operator()() const () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#18 0x00007f947ac307c5 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f947a97e3d9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::$_0>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreConcurrency.so
#20 0x00007f947935b3e1 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#21 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7f9475ed3e00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#22 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.cpp:168
#23 0x00007f947ac1c668 in void tbb::detail::d0::try_call_proxy<tbb::detail::d1::task_group_base::wait()::{lambda()#1}>::on_completion<tbb::detail::d1::task_group_base::wait()::{lambda()#2}>(tbb::detail::d1::task_group_base::wait()::{lambda()#2}) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#24 0x00007f947ac1a9c5 in edm::FinalWaitingTask::wait() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x00007f947abf8af0 in edm::EventProcessor::processRuns() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x00007f947abf5deb in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/cms/cmssw/CMSSW_14_2_CLANG_X_2024-09-03-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#27 0x000055ecea2a3b6e in tbb::detail::d1::task_arena_function<main::$_0::operator()() const::{lambda()#1}, void>::operator()() const ()
#28 0x00007f94793479ad in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:688
#29 0x000055ecea2a2d06 in main::$_0::operator()() const ()
#30 0x000055ecea2a07ff in main ()
Current Modules:
Module: Herwig7HadronizerFilter:generator (crashed)
A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)
That seems to be that Herwig has crashed due to some issue reading the LHE file (though I'm not exactly sure why as the lhe file is the same as the one that ran without issues in a non-CLANG build), and then we keep going despite not having produced the run file, causing a seg fault). For this new issue I'd definitely lay the blame at the fact we skip over any errors from Herwig here -it may or may not be the root cause of the full issue, but it certainly makes it harder to debug. We've kept this block for a long time as it's supposed to get around an issue with Herwig being called before the externalLHEProducer, but I think we should really get rid of it, as running past errors where Herwig would just exit could be the root of what we're seeing here, then deal with the issue with the sequence of calls if it's still occuring. I probably won't have time to try this in the next few days, so if you have time @theofil that would be good, otherwise I'll try to by the end of the week.
I have checked our opensearch and found that workflow 535
ran successfully for CMSSW_14_1_CLANG_X_2024-07-11-2300
IB. The first failure was in CMSSW_14_1_CLANG_X_2024-07-12-2300 but the error code was 256
and many other workflows also failed with exit code 256
that day. The first day workflow 535
failed with this segmentation error (exit code 62720
) was CMSSW_14_1_CLANG_X_2024-07-16-2300
. cmssw changes between 2024-07-11-2300 to 2024-07-12-2300 should be while cmssw changes between 2024-07-12-2300 to 2024-07-16-2300
I haven't yet found the origin of the problem, but I can reply to this question:
- Does the problem reproduce with one thread?
yes
@smuzaffar thanks a lot for the info. I see that the
https://github.com/cms-sw/cmssw/commit/91c2ca346214fb094120091a978c16b3698612e3
could relevant to the crash we see. I will try to have a look if this is really where the problem starts. Would replacing the relval_2017.py
of the IB that is crashing, with the reval_2017.py
from the working IB be a sensible check or there could be other things breaking behind ?
Apart from the software changes we see in the
are there other differences between the 2 IBs for what concerns their builds ?
As @makortel mentioned we also have updated c++ standard (to c++20) for July 12th IB.
By the way, build herwig7 and sherpa in debug mode, I get this stacktrace for workflow 535/step1
#3 0x00007fa46a6138bf in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 0x00007fa436425a15 in std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl_data::_Vector_impl_data (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:100
#6 std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_impl::_Vector_impl (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:139
#7 std::_Vector_base<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::_Vector_base (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:312
#8 std::vector<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > >::vector (this=0x7fa43648f7c0 <ThePEG::Particle::parents() const::null>) at /build/muz/clang/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_vector.h:526
#9 ThePEG::Particle::parents (this=<optimized out>) at ../include/ThePEG/EventRecord/Particle.h:159
#10 (anonymous namespace)::recursionNotNull (bin=..., p=...) at LesHouchesReader.cc:719
#11 0x00007fa436439265 in ThePEG::LesHouchesReader::createPartonBinInstances (this=0x7fa433865000) at LesHouchesReader.cc:731
#12 0x00007fa436432c66 in ThePEG::LesHouchesReader::getXComb (this=0x7fa433865000) at LesHouchesReader.cc:443
#13 0x00007fa436432ec4 in ThePEG::LesHouchesReader::getSubProcess (this=0x7fa433865000) at LesHouchesReader.cc:458
#14 0x00007fa436434307 in ThePEG::LesHouchesReader::readEvent (this=0x7fa433865000) at LesHouchesReader.cc:576
#15 0x00007fa43642d754 in ThePEG::LesHouchesReader::scan (this=0x7fa433865000) at LesHouchesReader.cc:305
#16 0x00007fa436431e42 in ThePEG::LesHouchesReader::initialize (this=<optimized out>, eh=...) at LesHouchesReader.cc:272
#17 0x00007fa43645bd59 in ThePEG::LesHouchesFileReader::initialize (this=0x7fa433865000, eh=...) at LesHouchesFileReader.cc:462
#18 0x00007fa436468466 in ThePEG::LesHouchesEventHandler::initialize (this=0x7fa40a74c400) at LesHouchesEventHandler.cc:87
#19 0x00007fa436686375 in ThePEG::EventGenerator::doinit (this=0x7fa44bc0ac00) at EventGenerator.cc:262
#20 0x00007fa436689b75 in ThePEG::InterfacedBase::init (this=0x7fa44bc0ac00) at ../include/ThePEG/Interface/InterfacedBase.h:246
#21 ThePEG::EventGenerator::setup (this=this@entry=0x7fa44bc0ac00, newRunName=..., newObjects=..., newParticles=..., newMatchers=...) at EventGenerator.cc:175
#22 0x00007fa4366cdd3a in ThePEG::Repository::makeRun (eg=..., name=...) at Repository.cc:316
#23 0x00007fa4366d054c in ThePEG::Repository::exec (command=..., os=...) at Repository.cc:786
#24 0x00007fa4366d0f3f in ThePEG::Repository::execAndCheckReply (line=..., os=...) at Repository.cc:510
#25 0x00007fa4366d1249 in ThePEG::Repository::read (is=..., os=..., prompt=...) at Repository.cc:566
#26 0x00007fa4366d16ad in ThePEG::Repository::read (filename=..., os=...) at Repository.cc:452
#27 0x00007fa437d953b9 in (anonymous namespace)::HerwigGenericRead (ui=...) at HerwigAPI.cc:146
#28 0x00007fa43833f4e8 in Herwig7Interface::callHerwigGenerator (this=this@entry=0x7fa43feb1190) at src/GeneratorInterface/Herwig7Interface/src/Herwig7Interface.cc:149
#29 0x00007fa43833da50 in Herwig7Interface::initRepository (this=0x7fa43feb1190, pset=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/FWCore/MessageLogger/interface/MessageLogger.h:78
#30 0x00007fa43838e568 in Herwig7Hadronizer::initializeForExternalPartons (this=this@entry=0x7fa43feb10a0) at src/GeneratorInterface/Herwig7Interface/plugins/Herwig7Hadronizer.cc:109
#31 0x00007fa43839c8be in edm::HadronizerFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce (this=0x7fa43feb1000, lumi=..., es=...) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/src/GeneratorInterface/Core/interface/HadronizerFilter.h:367
#32 0x00007fa473842fd8 in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#33 0x00007fa47382c7dd in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc12/cms/cmssw-patch/CMSSW_14_2_CLANG_X_2024-09-10-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
Note that all these failing workflows (511, 535, 537, 538 and 539
) in CLANG IBs are herwig7
One could also check if the problem reproduces with cmsRunGlibC or cmsRunTC (that use other allocators, if this is a memory problem they may behave differently, or even give some diagnostics)
failed for both cmsRunGlibC or cmsRunTC
.
Also failed in single thread mode
I had very little progress so far unfortunately.
I compiled two versions of Herwig under
CMSSW_14_2_X_2024-09-06-1100
CMSSW_14_2_CLANG_X_2024-09-05-2300
and run standalone Herwig MC generation, checking if we can generate simple processes without reading external LHE files. We cannot generate any event in CMSSW_14_2_CLANG_X_2024-09-05-2300
we get immediately a segmentation fault while attempting to make the 1st event, but everything is OK in CMSSW_14_2_X_2024-09-06-1100
and MC generation finishes normally. This confirms what earlier Dominic said, despite that in the first error messages we see complains about reading LHE files, this has nothing to do with the crash we have later on. (Actually we get these messages even when things work.)
While compiling the code in the two releases, I see many warnings regarding to the ThePEG regarding arithmetic operations that I was not used to see before, but that all seem innocent in the CMSSW_14_2_X_2024-09-06-1100 warnings case.
However in the CMSSW_14_2_CLANG_X_2024-09-05-2300.txt warnings we see for fist time warning regarding the creation of the RCPtr
pointer in particular the /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02853/el8_amd64_gcc12/external/thepeg/2.2.2-330d679d0765729c295842b54c3a747c/include/ThePEG/Pointer/RCPtr.h:152:15: note: in implicit copy constructor for 'ThePEG::EventInfoBase' first required here 152 | ptr = new T(t);
which later appears in the crash messages.
This to me confirms that the problem is not related with the CMSSW HerwigInterface and there is not much we can do there, but rather with the external package ThePEG, which is needed by Herwig generator.
Is there a reason why we build CMSSW with clang while the external packages, like ThePEG are still built with gcc ? Is it sound to use the same binary of the ThePEG in the two cases ? Is it possible to try to have the ThePEG built also with clang instead of gcc when CMSSW is built with clang ?
RelVals 535.0, 537.0, 538.0 failed with SIGSEGV in
ThePEG::EventGenerator::doinit
: