cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

Segfault in Herwig #33544

Closed makortel closed 3 years ago

makortel commented 3 years ago

Workflow 535.0 step 1 crashes in CMSSW_12_0_X_2021-04-27-1100

27-Apr-2021 14:34:58 CEST  Successfully opened LHE file thread0/cmsgrid_final.lhe
Error: No such file or directory: cmsgrid_final.lhe
Error: The object '/Herwig/Partons/PDFSet_nnlo' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/PDFSet_lo' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesHandler' was not created as another object with that name already exists.
Error: The object '/Herwig/Cuts/NoCuts' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/LHAPDF' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesReader' was not created as another object with that name already exists.
Error: No such file or directory: cmsgrid_final.lhe
Herwig: EventGenerator not available.
Check if 'InterfaceMatchboxTest.run' is a valid run file.

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Tue Apr 27 14:35:01 CEST 2021
Thread 5 (Thread 0x2b989fe00700 (LWP 20760)):
#2  0x00002b9869c53720 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b985ec41d19 in syscall () from /lib64/libc.so.6
#5  0x00002b985db1ed9d in tbb::internal::futex_wait (comparand=2, futex=0x2b9865f0302c) at ../../include/tbb/machine/linux_common.h:81

Thread 4 (Thread 0x2b989ef70700 (LWP 20759)):
#2  0x00002b9869c53720 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b985ec41d19 in syscall () from /lib64/libc.so.6
#5  0x00002b985db1ed9d in tbb::internal::futex_wait (comparand=2, futex=0x2b9865f0312c) at ../../include/tbb/machine/linux_common.h:81

Thread 3 (Thread 0x2b989e56f700 (LWP 20758)):
#2  0x00002b9869c53720 in sig_pause_for_stacktrace () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00002b985ec41d19 in syscall () from /lib64/libc.so.6
#5  0x00002b985db1ed9d in tbb::internal::futex_wait (comparand=2, futex=0x2b9865f030ac) at ../../include/tbb/machine/linux_common.h:81

Thread 1 (Thread 0x2b9860844180 (LWP 20111)):
#2  0x00002b9869c5456c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  0x00002b9869c55922 in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00002b98909476d0 in (anonymous namespace)::HerwigGenericRun(Herwig::HerwigUI const&, bool) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/external/slc7_amd64_gcc900/lib/libHerwigAPI.so.2
#6  0x00002b9890949757 in Herwig::API::prepareRun(Herwig::HerwigUI const&) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/external/slc7_amd64_gcc900/lib/libHerwigAPI.so.2
#7  0x00002b9890379abd in Herwig7Interface::callHerwigGenerator() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/lib/slc7_amd64_gcc900/libGeneratorInterfaceHerwig7Interface.so
#8  0x00002b9890379cb1 in Herwig7Interface::initGenerator() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/lib/slc7_amd64_gcc900/libGeneratorInterfaceHerwig7Interface.so
#9  0x00002b98903390f4 in Herwig7Hadronizer::initializeForInternalPartons() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/lib/slc7_amd64_gcc900/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#10 0x00002b9890346444 in edm::GeneratorFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::beginLuminosityBlockProduce(edm::LuminosityBlock&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw-patch/CMSSW_12_0_X_2021-04-27-1100/lib/slc7_amd64_gcc900/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#11 0x00002b985c488acd in edm::one::EDFilterBase::doBeginLuminosityBlock(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#12 0x00002b985c466d10 in edm::WorkerT<edm::one::EDFilterBase>::implDoBegin(edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#13 0x00002b985c370e96 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#14 0x00002b985c37109d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#15 0x00002b985c371346 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0>::Context const*) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#16 0x00002b985c3718b0 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&) () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#17 0x00002b985c371a61 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&>(tbb::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::LuminosityBlockPrincipal, (edm::BranchActionType)0> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#18 0x00002b985c107f49 in tbb::internal::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/nweek-02678/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-04-25-0000/lib/slc7_amd64_gcc900/libFWCoreConcurrency.so

Current Modules:
Module: Herwig7GeneratorFilter:generator (crashed)
Module: none
Module: none
Module: none

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_12_0_X_2021-04-27-1100/pyRelValMatrixLogs/run/535.0_TTbar_13TeV_Pow_herwig7+TTbar_13TeV_Pow_herwig7+HARVESTGEN/step1_TTbar_13TeV_Pow_herwig7+TTbar_13TeV_Pow_herwig7+HARVESTGEN.log#/

makortel commented 3 years ago

assign generators

cmsbuild commented 3 years ago

New categories assigned: generators

@mkirsano,@SiewYan,@alberto-sanchez,@agrohsje,@GurpreetSinghChahal you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 3 years ago

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 3 years ago

Following Error printouts are visible in CMSSW_12_0_X_2021-04-27-1100

Error: The object '/Herwig/Partons/PDFSet_nnlo' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/PDFSet_lo' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesHandler' was not created as another object with that name already exists.
Error: The object '/Herwig/Cuts/NoCuts' was not created as another object with that name already exists.
Error: The object '/Herwig/Partons/LHAPDF' was not created as another object with that name already exists.
Error: The object '/Herwig/EventHandlers/LesHouchesReader' was not created as another object with that name already exists.

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc900/CMSSW_12_0_X_2021-04-26-2300/pyRelValMatrixLogs/run/535.0_TTbar_13TeV_Pow_herwig7+TTbar_13TeV_Pow_herwig7+HARVESTGEN/step1_TTbar_13TeV_Pow_herwig7+TTbar_13TeV_Pow_herwig7+HARVESTGEN.log#/

where the job succeeded. It has one printout of Error: No such file or directory: cmsgrid_final.lhe whereas the failing case has total of three of them. Can these be related to the crash or are they unrelated?

agrohsje commented 3 years ago

Hi @Dominic-Stafford @theofil , is one of you available to follow-up on the seg-fault?

silviodonato commented 3 years ago

I think the origin of the error is in #33516. Specifically, the change in Configuration/Generator/python/TTbar_Pow_LHE_13TeV_cff.py affects https://github.com/cms-sw/cmssw/blob/master/Configuration/Generator/python/TT_13TeV_Pow_Herwig7_cff.py#L3.

It looks like we can fix this by forcing externalLHEProducer.generateConcurrently = cms.untracked.bool(False), in TT_13TeV_Pow_Herwig7_cff.py

I don't know whether it was expected that Herwig7 works concurrently or not.

cc @SiewYan

colizz commented 3 years ago

Thanks. As far as I see the seg-fault is caused by the missing "cmsgrid_final.lhe". As I know, concurrent externalLHEProducer will produce LHE separately in each subfolder thread*/cmsgrid_final.lhe, and combine them directly in CMSSW to produce a EDM LHE file. Therefore there is no final ./cmsgrid_final.lhe produced. It happened that H7 will intrinsically read the cmsgrid_final.lhe directly instead of the EDM LHE file, therefore causing this crash.

This indicates that "concurrent externalLHEProducer" is not compatible with H7GeneratorFilter (sorry didn't notice this). To my understanding the easiest solution would be to write out a cmsgrid_final.lhe in the former step. What do you think? If it's fine I can make this PR shortly.

Dominic-Stafford commented 3 years ago

As @colizz said, Herwig directly reads in cmsgrid_final.lhe rather than the EDM LHE file. If the concurrent externalLHEProducer could be changed to also write out this file this should work, otherwise it would probably be simplest to not use this with Herwig 7. As @makortel mentioned, Herwig also produces some spurious error messages when running normally, which make diagnosing issue like these a bit harder. We believe this is because the CMSSW scheduler currently tries to call Herwig before the externalLHEProducer- @theofil is currently looking in to this.

colizz commented 3 years ago

ok thanks! I'll handle the "concurrent externalLHEProducer" side.

silviodonato commented 3 years ago

Is there any chance to have the fix ready by CMSSW_12_0_0_pre1 (next Tuesday)?

colizz commented 3 years ago

Hi, I just submitted the PR to fix this: #33615. The Herwig7 errors still occurs (as in the single-core case) but should be independent of the seg-fault raised here.

colizz commented 3 years ago

Hello,

I just observed a new segfault in H7. I use CMSSW_12_0_X_2021-05-24-2300 to test the process JME-RunIISummer20UL16wmLHEGEN-00003. Here are the commands:

cmsrel CMSSW_12_0_X_2021-05-24-2300
cd CMSSW_12_0_X_2021-05-24-2300/src
cmsenv
curl -s -k https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/JME-RunIISummer20UL16wmLHEGEN-00003 --retry 3 --create-dirs -o Configuration/GenProduction/python/JME-RunIISummer20UL16wmLHEGEN-00003-fragment.py
[ -s Configuration/GenProduction/python/JME-RunIISummer20UL16wmLHEGEN-00003-fragment.py ] || exit $?;
scram b -j8
cd ../..
cmsDriver.py Configuration/GenProduction/python/JME-RunIISummer20UL16wmLHEGEN-00003-fragment.py --python_filename JME-RunIISummer20UL16wmLHEGEN-00003_1_cfg.py --eventcontent RAWSIM,LHE --customise Configuration/DataProcessing/Utils.addMonitoring --datatier GEN,LHE --fileout file:JME-RunIISummer20UL16wmLHEGEN-00003.root --conditions 112X_mcRun2_asymptotic_v2 --beamspot Realistic25ns13TeV2016Collision --customise_commands process.source.numberEventsInLuminosityBlock="cms.untracked.uint32(101)" --step LHE,GEN --geometry DB:Extended --era Run2_2016 --no_exec --mc -n 5000
cmsRun JME-RunIISummer20UL16wmLHEGEN-00003_1_cfg.py

The segfault occurs randomly when the events are processed at around 500-2000:

...
Begin processing the 776th record. Run 1, Event 776, LumiSection 8 on stream 0 at 25-May-2021 09:41:10.100 CEST
Begin processing the 777th record. Run 1, Event 777, LumiSection 8 on stream 0 at 25-May-2021 09:41:10.324 CEST
Begin processing the 778th record. Run 1, Event 778, LumiSection 8 on stream 0 at 25-May-2021 09:41:10.734 CEST

A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Tue May 25 09:41:11 CEST 2021
Thread 2 (Thread 0x7f96dc1d3700 (LWP 4635)):
#0  0x00007f97007e51d9 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f96f39dc387 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#2  0x00007f96f39dd01a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  0x00007f9700dddaf0 in std::execute_native_thread_routine (__p=0x7f96dcaf9520) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007f97007ddea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f97005069fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f96feade540 (LWP 4577)):
#0  0x00007f97004fbccd in poll () from /lib64/libc.so.6
#1  0x00007f96f39dc7b7 in full_read.constprop () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#2  0x00007f96f39dd0ec in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#3  0x00007f96f39e059b in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  0x00007f96caf5325a in Herwig::ColourReconnector::_isColour8(ThePEG::Pointer::TransientConstRCPtr<ThePEG::Particle>, ThePEG::Pointer::TransientConstRCPtr<ThePEG::Particle>) const () from /cvmfs/cms-ib.cern.ch/nweek-02682/slc7_amd64_gcc900/external/herwig7/7.2.2-2bfae0df6f5a8d9801ab4e178064f4d8/lib/Herwig/Herwig.so.27
#6  0x00007f96caf57976 in Herwig::ColourReconnector::_findPartnerBaryonic(__gnu_cxx::__normal_iterator<ThePEG::Pointer::RCPtr<Herwig::Cluster>*, std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > > >, std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > >&, bool&, std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > > const&, __gnu_cxx::__normal_iterator<ThePEG::Pointer::RCPtr<Herwig::Cluster>*, std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > > >&, __gnu_cxx::__normal_iterator<ThePEG::Pointer::RCPtr<Herwig::Cluster>*, std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > > >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02682/slc7_amd64_gcc900/external/herwig7/7.2.2-2bfae0df6f5a8d9801ab4e178064f4d8/lib/Herwig/Herwig.so.27
#7  0x00007f96caf587b8 in Herwig::ColourReconnector::_doRecoBaryonic(std::vector<ThePEG::Pointer::RCPtr<Herwig::Cluster>, std::allocator<ThePEG::Pointer::RCPtr<Herwig::Cluster> > >&) const () from /cvmfs/cms-ib.cern.ch/nweek-02682/slc7_amd64_gcc900/external/herwig7/7.2.2-2bfae0df6f5a8d9801ab4e178064f4d8/lib/Herwig/Herwig.so.27
#8  0x00007f96caf44eb0 in Herwig::ClusterHadronizationHandler::handle(ThePEG::EventHandler&, std::vector<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle>, std::allocator<ThePEG::Pointer::TransientRCPtr<ThePEG::Particle> > > const&, ThePEG::Hint const&) () from /cvmfs/cms-ib.cern.ch/nweek-02682/slc7_amd64_gcc900/external/herwig7/7.2.2-2bfae0df6f5a8d9801ab4e178064f4d8/lib/Herwig/Herwig.so.27
#9  0x00007f96cc0429b3 in ThePEG::EventHandler::performStep (this=0x7f96c82a7c00, handler=..., hint=...) at EventHandler.cc:196
#10 0x00007f96cc042cca in ThePEG::EventHandler::continueCollision (this=this@entry=0x7f96c82a7c00) at ../include/ThePEG/Pointer/RCPtr.h:879
#11 0x00007f96cbdb4912 in ThePEG::LesHouchesEventHandler::performCollision (this=0x7f96c82a7c00) at LesHouchesEventHandler.cc:334
#12 0x00007f96cbdb739f in ThePEG::LesHouchesEventHandler::generateEvent (this=0x7f96c82a7c00) at LesHouchesEventHandler.cc:256
#13 0x00007f96cbfe9bf6 in ThePEG::EventGenerator::doShoot (this=0x7f96d07e1800) at ../include/ThePEG/Pointer/RCPtr.h:879
#14 0x00007f96cbfe8eab in ThePEG::EventGenerator::shoot (this=0x7f96d07e1800) at EventGenerator.cc:432
#15 0x00007f96cdb43743 in Herwig7Hadronizer::generatePartonsAndHadronize() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#16 0x00007f96cdb52137 in edm::GeneratorFilter<Herwig7Hadronizer, gen::ExternalDecayDriver>::filter(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/pluginGeneratorInterfaceHerwig7HadronizerPlugins.so
#17 0x00007f9702fe1b4b in edm::one::EDFilterBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#18 0x00007f9702fc700d in edm::WorkerT<edm::one::EDFilterBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#19 0x00007f9702f25995 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#20 0x00007f9702f25b4d in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#21 0x00007f9702f25e56 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#22 0x00007f9702f28440 in void edm::SerialTaskQueueChain::actionToRun<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#23 0x00007f9702f28681 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&>(tbb::detail::d1::task_group&, edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute()::{lambda()#1}&)::{lambda()#1}>::execute() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#24 0x00007f97031202c9 in tbb::detail::d1::function_task<edm::SerialTaskQueue::spawn(edm::SerialTaskQueue::TaskBase&)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreConcurrency.so
#25 0x00007f9701827d0b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (this=0x7f96fd55be00, t=0x7f96fd54e400, waiter=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_0_0_pre1-slc7_amd64_gcc900/build/CMSSW_12_0_0_pre1-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.2.0/tbb-v2021.2.0/src/tbb/task_dispatcher.h:396
#26 0x00007f97018247e5 in tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=0x0, this=0x7f96fd55be00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_0_0_pre1-slc7_amd64_gcc900/build/CMSSW_12_0_0_pre1-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.2.0/tbb-v2021.2.0/src/tbb/task_dispatcher.cpp:178
#27 tbb::detail::r1::task_dispatcher::execute_and_wait (t=0x0, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_0_0_pre1-slc7_amd64_gcc900/build/CMSSW_12_0_0_pre1-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.2.0/tbb-v2021.2.0/src/tbb/task_dispatcher.cpp:168
#28 0x00007f9702e95d8f in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#29 0x00007f9702e9f115 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_0_X_2021-05-24-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
#30 0x000000000040bae6 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#31 0x00007f970180c970 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_12_0_0_pre1-slc7_amd64_gcc900/build/CMSSW_12_0_0_pre1-build/BUILD/slc7_amd64_gcc900/external/tbb/v2021.2.0/tbb-v2021.2.0/src/tbb/arena.cpp:674
#32 0x000000000040ca58 in main::{lambda()#1}::operator()() const ()
#33 0x000000000040b62c in main ()

Current Modules:

Module: Herwig7GeneratorFilter:generator (crashed)

A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)

@Dominic-Stafford @agrohsje Would you mind also taking a look at this? Many thanks.

agrohsje commented 3 years ago

Let me add @theofil . @Dominic-Stafford @theofil will you follow-up ?

Dominic-Stafford commented 3 years ago

Thanks for bringing this up- it's not immediately obvious to me what's going wrong here, but I've started running it to have a look

theofil commented 3 years ago

I've been able to reproduce the error just after "the 3278th record" but I haven't yet found any explanation for it.

Do we know which is the latest release in which the same fragment

https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/JME-RunIISummer20UL16wmLHEGEN-00003

was able to be run without problems ?

best, Kostas

On Tue, May 25, 2021 at 12:56 PM Dominic-Stafford @.***> wrote:

Thanks for bringing this up- it's not immediately obvious to me what's going wrong here, but I've started running it to have a look

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cms-sw/cmssw/issues/33544#issuecomment-847768595, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDV54SUBQDJTMDX7A2TBK3TPN66ZANCNFSM43VGFVPA .

makortel commented 3 years ago

(it would be better to open a new issue for a new segfault, especially if the cause would likely be different)

colizz commented 3 years ago

@theofil Thanks for the test. I don't have idea yet when the issue starts to appear. (It looks odd to me because my first test in the same condition has no error, but then segfault starts to appears regularly at 50-2000 event in my later run.)

@makortel ok, sure.