Open makortel opened 1 year ago
assign geometry
New categories assigned: geometry
@mdhildreth,@Dr15Jones,@makortel,@bsunanda,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
is architecture dependence excluded (aka INTEL vs AMD)?
Good point. I checked the PR test and baseline runTheMatrix output of https://github.com/cms-sw/cmssw/pull/41186#issuecomment-1483895309, and both were run on Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
.
I'm running valgrind on step1 and see a bunch of these
==1882951== Invalid read of size 8
==1882951== at 0x40F3AA2E: vecgeom::cxx::CommonUnplacedVolumeImplHelper<vecgeom::cxx::PolyhedronImplementation<(EInnerRadii)0, (EPhiCutout)0>, vecgeom::cxx::VUnplacedVolume>::SafetyToIn(vecgeom::cxx::Vector3D<double> const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40EF79F6: G4UAdapter<vecgeom::cxx::UnplacedPolyhedron>::DistanceToIn(CLHEP::Hep3Vector const&) const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40FC659C: G4VoxelNavigation::ComputeStep(CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const&, double, double&, G4NavigationHistory&, bool&, CLHEP::Hep3Vector&, bool&, bool&, G4VPhysicalVolume**, int&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40CDF04A: G4Navigator::ComputeStep(CLHEP::Hep3Vector const&, CLHEP::Hep3Vector const&, double, double&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40E52FA6: G4Transportation::AlongStepGetPhysicalInteractionLength(G4Track const&, double, double, double&, G4GPILSelection*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40E4B87B: G4TrackingManager::ProcessOneTrack(G4Track*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40C38D19: G4EventManager::DoProcessing(G4Event*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40989BB9: RunManagerMTWorker::produce(edm::Event const&, edm::EventSetup const&, RunManagerMT&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x40995831: omt::ThreadHandoff::Functor<OscarMTProducer::produce(edm::Event&, edm::EventSetup const&)::{lambda()#1}>::execute() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x4097A919: omt::ThreadHandoff::threadLoop(void*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/biglib/el8_amd64_gcc11/pluginSimulation.so)
==1882951== by 0x70861C9: start_thread (in /usr/lib64/libpthread-2.28.so)
==1882951== by 0x72D7E72: clone (in /usr/lib64/libc-2.28.so)
==1882951== Address 0x54664188 is 24 bytes before an unallocated block of size 0 in arena "client"
==1882951==
need to be understood if related specifically to DD4HEP
my valgrind command
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --tool=memcheck \
--suppressions=$ROOTSYS/etc/valgrind-root.supp \
--suppressions=$CMSSW_RELEASE_BASE/src/Utilities/ReleaseScripts/data/cms-valgrind-memcheck.supp cmsRun $1
not sure if the valgrind report is actually still related to this https://sft.its.cern.ch/jira/projects/VECGEOM/issues/VECGEOM-600?filter=allopenissues
@VinInn , this issue was understood as a compiler bug when -O3 optimisation is used. The solution was to use -O2 optimisation for VecGeom. However, I am not sure if the problem Matti are reporting here is the same.
I got the report above in latest 13_1_X nighty. Maybe understood, not solved apparently. VecGeom not vectorized is a bit incongruous...
one more valgrind message in step2 (most probably for a different issue)
==1891808== Invalid free() / delete / delete[] / realloc()
==1891808== at 0x403BF6C: free (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/external/valgrind/3.17.0-7ca83817e7379e83453f913e11e14834/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1891808== by 0x48F90DDB: edm::Wrapper<ZVertexSoAHeterogeneousHost<131072> >::~Wrapper() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libCUDADataFormatsVe
rtex.so)
==1891808== by 0x48F90DF3: edm::Wrapper<ZVertexSoAHeterogeneousHost<131072> >::~Wrapper() (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libCUDADataFormatsVe
rtex.so)
==1891808== by 0x4DD56FA: edm::productholderindexhelper::getContainedTypeFromWrapper(edm::TypeID const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-0277
8/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.so)
==1891808== by 0x4DDB31F: edm::ProductRegistry::initializeLookupTables(std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const*, std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const*,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.
so)
==1891808== by 0x4DD15EF: edm::ProductRegistry::setFrozen(std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const&, std::set<edm::TypeID, std::less<edm::TypeID>, std::allocator<edm::TypeID> > const&, std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libDataFormatsProvenance.so)
==1891808== by 0x4C161B6: edm::Schedule::finishSetup(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::ProductRegistry&, edm::BranchIDListHelper&, edm::ProcessBlockHelperBase&, edm::ThinnedAssociationsHelper&, edm::SubProc
essParentageHelper const*, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration>, bool, edm::PreallocationConfiguration const&, edm::ProcessContext const*) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64
_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808== by 0x4C2676D: edm::ScheduleItems::finishSchedule(edm::ScheduleItems::MadeModules, edm::ParameterSet&, edm::service::TriggerNamesService const&, bool, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::Proc
essBlockHelperBase&) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808== by 0x4B68977: edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13
_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808== by 0x4B6BAD0: edm::EventProcessor::EventProcessor(std::shared_ptr<edm::ProcessDesc>, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch
/CMSSW_13_1_X_2023-03-28-1100/lib/el8_amd64_gcc11/libFWCoreFramework.so)
==1891808== by 0x40C0AC: tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const (in /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02778/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_1_X_2023
-03-28-1100/bin/el8_amd64_gcc11/cmsRun)
==1891808== by 0x63D3846: tbb::detail::r1::task_arena_impl::execute(tbb::detail::d1::task_arena_base&, tbb::detail::d1::delegate_base&) (arena.cpp:694)
==1891808== Address 0xaa644380 is in a rw- anonymous segment
==1891808==
Another occurrence in https://github.com/cms-sw/cmssw/pull/41274#issuecomment-1496498960, this time in workflow 12434.0
assign dqm
New categories assigned: dqm
@micsucmed,@rvenditti,@emanueleusai,@syuvivida,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks
FYI @cms-sw/trk-dpg-l2
Another occurrence in https://github.com/cms-sw/cmssw/pull/41328#issuecomment-1515595835, this time in workflow 12434.0
Another occurrence in https://github.com/cms-sw/cmssw/pull/41460#issuecomment-1530037180 in workflow 12434.0
Another occurrence in https://github.com/cms-sw/cmssw/pull/41876#issuecomment-1578065153 in workflow 12434.0
Another occurrence in https://github.com/cms-sw/cmsdist/pull/8545#issuecomment-1598402301 in workflow 12434.0 (although there because of an update of the compiler minor(?) differences in generated code can not be excluded)
Another occurrence in https://github.com/cms-sw/cmssw/pull/42075#issuecomment-1605091855 in workflow 12434.0.
type trk
The
HLT/SiStrip/ControlView/{ClusterStoNCorr_OnTrack_FECCratevsFECSlot,ClusterStoNCorr_OnTrack_FECSlotVsFECRing_TECP}
histograms showed differences in workflow 11634.911 in PR tests of https://github.com/cms-sw/cmssw/pull/41186#issuecomment-1483895309 . The PR itself is very unlikely to be the cause of the differences. The differences have also not been visible in other recent PR tests, so these differences have likely random origin. The purpose of this issue is to nevertheless document them, in case they are visible in other tests later on.The 11634.911 is the DD4Hep workflow that, IIUC, reads the geometry from the XML file instead from the CondDB. These differences may be evidence of some rare non-reproducibility in DD4Hep code path (that we have observed, but not really solved, before).