Open makortel opened 3 years ago
assign core, generators
New categories assigned: core,generators
@Dr15Jones,@smuzaffar,@Saptaparna,@mkirsano,@SiewYan,@alberto-sanchez,@makortel,@agrohsje,@GurpreetSinghChahal you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
Do we want to try patch https://gitlab.com/hepcedar/lhapdf/-/merge_requests/3 ? I'm not really sure how we could test it conclusively, given that it took ~1.5 months to hit to the issue after the update to LHAPDF 6.3.0.
Hi @mkirsano : I would agree with @makortel and apply the proposed patch.
Hello, does it mean to put lhapdf code to our repo?
If it is needed for CMS and it will work, why not.
Then probably we have to write to authors about it.
On 9/7/21 5:12 PM, agrohsje wrote:
Hi @mkirsano https://github.com/mkirsano : I would agree with @makortel https://github.com/makortel and apply the proposed patch.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cms-sw/cmssw/issues/35082#issuecomment-914394276, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGOJUVCW4SNU34FS3E2PALUAYTWPANCNFSM5DDBTJNA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Apparently the authors released a beta of 6.4.0 that eliminates the cache (I did not look into details), see https://gitlab.com/hepcedar/lhapdf/-/issues/2#note_674891431. Should we try that?
If https://github.com/cms-sw/cmssw/issues/35251 is connected, we just got a way to test rather conclusively if the beta 6.4.0 would fix the race conditions in our case.
It happened here as well
#3 0x00002b69b6b287bb in sig_dostack_then_abort () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/lib/slc7_amd64_gcc900/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 std::local_Rb_tree_rotate_left (__root=@0x2b69e142c750: 0x2b6a27010360, __x=0x2b69fd8e4100) at ../../../../../libstdc++-v3/src/c++98/tree.cc:138
#6 std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x2b6a2697c660, __p=<optimized out>, __header=...) at ../../../../../libstdc++-v3/src/c++98/tree.cc:278
#7 0x00002b69e13aad9d in LHAPDF::_getXCachesMap() () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#8 0x00002b69e13aae5d in LHAPDF::LogBicubicInterpolator::_getCacheX(LHAPDF::KnotArray1F const&, double, unsigned long) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#9 0x00002b69e13ab48f in LHAPDF::LogBicubicInterpolator::_interpolateXQ2(LHAPDF::KnotArray1F const&, double, unsigned long, double, unsigned long) const () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#10 0x00002b69e13a99ff in LHAPDF::Interpolator::interpolateXQ2(int, double, double) const () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#11 0x00002b69e13a4adf in LHAPDF::GridPDF::_xfxQ2(int, double, double) const () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#12 0x00002b69e1397807 in LHAPDF::PDF::xfxQ2(int, double, double) const () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libLHAPDF.so
#13 0x00002b69ff3ab8e7 in Pythia8::LHAPDF6::xfUpdate(int, double, double) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8lhapdf6.so
#14 0x00002b69e0d5b1b7 in Pythia8::PDF::xf(int, double, double) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#15 0x00002b69e0f35799 in Pythia8::SigmaProcess::sigmaPDF(bool, bool, bool, double, double) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#16 0x00002b69e0da31e7 in Pythia8::PhaseSpace::setupSampling123(bool, bool) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#17 0x00002b69e0dd72d0 in Pythia8::ProcessContainer::init(bool, Pythia8::ResonanceDecays*, Pythia8::SLHAinterface*, Pythia8::GammaKinematics*) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#18 0x00002b69e0df0c5b in Pythia8::ProcessLevel::init(bool, Pythia8::SLHAinterface*, std::vector<Pythia8::SigmaProcess*, std::allocator<Pythia8::SigmaProcess*> >&, std::vector<Pythia8::PhaseSpace*, std::allocator<Pythia8::PhaseSpace*> >&) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#19 0x00002b69e0e16025 in Pythia8::Pythia::init() () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/external/slc7_amd64_gcc900/lib/libpythia8.so
#20 0x00002b69df2ba0f0 in Pythia8Hadronizer::initializeForInternalPartons() () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/lib/slc7_amd64_gcc900/pluginGeneratorInterfacePythia8Filters.so
#21 0x00002b69df2f6034 in edm::ConcurrentGeneratorFilter<Pythia8Hadronizer, gen::ConcurrentExternalDecayDriver>::initLumi(edm::gen::GenStreamCache<Pythia8Hadronizer, gen::ConcurrentExternalDecayDriver>*, edm::LuminosityBlock const&, edm::EventSetup const&) const () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/lib/slc7_amd64_gcc900/pluginGeneratorInterfacePythia8Filters.so
#22 0x00002b69ae8757c0 in edm::global::EDFilterBase::doStreamBeginLuminosityBlock(edm::StreamID, edm::LumiTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms-ib.cern.ch/nweek-02699/slc7_amd64_gcc900/cms/cmssw/CMSSW_12_1_ROOT624_X_2021-09-22-2300/lib/slc7_amd64_gcc900/libFWCoreFramework.so
...
Current Modules:
Module: Pythia8ConcurrentGeneratorFilter:generator (crashed)
Module: Pythia8ConcurrentGeneratorFilter:generator
Module: OscarMTProducer:g4SimHits
Module: Pythia8ConcurrentGeneratorFilter:generator
A fatal system signal has occurred: segmentation violation
Dear @mkirsano, @SiewYan , lhapdf 6.4.0 including NNPDF 4.0 is now out. I would suggest to update cmsdist. Are you available? This allows us to study the potential PDFs for Run 3 as well as a possible answer to the problem of thread safety.
Yes, OK
Given that we have updated LHAPDF to 6.4.0 that (presumably) eliminated the cache, and that we don't have any reports from past few years (although that doesn't mean they couldn't have occurred), maybe we should close this issue?
cms-bot internal usage
https://github.com/cms-sw/cmssw/pull/35073#issuecomment-908647991 experienced a rare crash
that points to a race condition in LHAPDF that is discussed in https://gitlab.com/hepcedar/lhapdf/-/issues/2.