Closed kutschke closed 8 months ago
A small correction. Ray reports that our extracted position reconstruction jobs also show extreme memory growth. Jobs like ceSimReco, cosmicSimReco, potSim do not.
Thanks, @kutschke. Running valgrind now.
@kutschke, see this valgrind report which was produced from running 10 events. I'm not seeing anything that directly implicates art. I see some memory leaks from Mu2e and some sizable leaks from ROOT, but I'll keep on it and let you know what I find out.
This is with root
v6_28_06b
, right?
This is with
root
v6_28_06b
, right?
Yes.
@knoepfel Any chance you could try to duplicate the leak in a local build with root
v6_28_10
?
Or even v6_30_02
?
Thanks. I asked you to look into it becuase the issue is triggered when we build the same code against two different versions of the art suite. I understand that it could be an issue on our side that was latent until now - but I appreciate all the help I can get.
As far as we are concerned, we are happy to move to a newer root if that resolves the problem.
Understood, @kutschke. @greenc-FNAL, I'd have to be able to reproduce the leak outside of Mu2e to test a different version of ROOT (unless I rebuild some of the other products that Mu2e depends upon). I tried doing that yesterday with a simple art job and was unsuccessful.
In the meantime:
--num-callers
, which may end up exposing problems in art that are just not shown with the default --num-callers=12
setting.Can you remind me. In the memory tracker db, is Vsize the peak virtual size at the time it is interogated or the current virtual size. I think it's peak.
@kutschke, Vsize
in the (e.g.) EventInfo
or ModuleInfo
table is the current virtual size. The peak size (VmPeak
) is only recorded at the end, and it's shown in the PeakUsage
table. Incidentally, the RSS is the more relevant metric, as it is the actual memory used by the program.
I believe I am seeing the same behavior in some art tests. I will let you know.
Just ran massif
on the mu2e job with 500 events. See the massif printout here.
Pinging @pcanal to see if he has guidance.
The largest culprit seems to be a TGraph::Fit
command in mu2eCrv::MakeCrvRecoPulses::SetWaveform
:
==201374== 672,560 bytes in 6,005 blocks are definitely lost in loss record 29,240 of 29,249
==201374== at 0x4C2B15C: malloc (vg_replace_malloc.c:431)
==201374== by 0xF344E725: clang::Parser::AnnotateTemplateIdToken(clang::OpaquePtr<clang::TemplateName>, clang::TemplateNameKind, clang::CXXScopeSpec&, clang::SourceLocation, clang::UnqualifiedId&, bool, bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374== by 0xF33E28E0: clang::Parser::ParseOptionalCXXScopeSpecifier(clang::CXXScopeSpec&, clang::OpaquePtr<clang::QualType>, bool, bool, bool*, bool, clang::IdentifierInfo**, bool, bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374== by 0xF345DBD2: clang::Parser::TryAnnotateCXXScopeToken(bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374== by 0xF2A57896: cling::LookupHelper::findScope(llvm::StringRef, cling::LookupHelper::DiagSetting, clang::Type const**, bool) const (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374== by 0xF29824CD: GetClassSharedLibsForModule (TCling.cxx:6957)
==201374== by 0xF29824CD: TCling::GetClassSharedLibs(char const*) (TCling.cxx:7061)
==201374== by 0xF2965186: TCling::ShallowAutoLoadImpl(char const*) (TCling.cxx:6119)
==201374== by 0xF2983220: TCling::DeepAutoLoadImpl(char const*, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) (TCling.cxx:6172)
==201374== by 0xF2983743: TCling::AutoLoad(char const*, bool) (TCling.cxx:6284)
==201374== by 0xF29838A5: TCling::AutoLoad(std::type_info const&, bool) (TCling.cxx:6103)
==201374== by 0x14EB2D47: TClass::GetClass(std::type_info const&, bool, bool, unsigned long, unsigned long) (TClass.cxx:3316)
==201374== by 0x139AA957: GetClass (TClass.h:628)
==201374== by 0x139AA957: GetClassHelper<std::tuple<char const*> > (TClass.h:644)
==201374== by 0x139AA957: GetClass<std::tuple<char const*> > (TClass.h:654)
==201374== by 0x139AA957: long TPluginHandler::ExecPluginImpl<char const*>(char const* const&) (TPluginManager.h:175)
==201374== by 0x139A9B5C: ExecPlugin<char const*> (TPluginManager.h:218)
==201374== by 0x139A9B5C: ROOT::Math::Factory::CreateMinimizer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Factory.cxx:100)
==201374== by 0x139AC9E0: ROOT::Fit::FitConfig::CreateMinimizer() (FitConfig.cxx:188)
==201374== by 0x139C4DC8: ROOT::Fit::Fitter::DoInitMinimizer() (Fitter.cxx:736)
==201374== by 0x139CA8E4: bool ROOT::Fit::Fitter::DoMinimization<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> > >(std::unique_ptr<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> >, std::default_delete<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> > > >, ROOT::Math::IBaseFunctionMultiDimTempl<double> const*) (Fitter.cxx:867)
==201374== by 0x139C72DB: ROOT::Fit::Fitter::DoLeastSquareFit(ROOT::EExecutionPolicy const&) (Fitter.cxx:362)
==201374== by 0x1EFE5F11: LeastSquareFit (Fitter.h:181)
==201374== by 0x1EFE5F11: Fit (Fitter.h:166)
==201374== by 0x1EFE5F11: TFitResultPtr HFit::Fit<TGraph>(TGraph*, TF1*, Foption_t&, ROOT::Math::MinimizerOptions const&, char const*, ROOT::Fit::DataRange&) (HFitImpl.cxx:371)
==201374== by 0x1EFDB3F6: ROOT::Fit::FitObject(TGraph*, TF1*, Foption_t&, ROOT::Math::MinimizerOptions const&, char const*, ROOT::Fit::DataRange&) (HFitImpl.cxx:981)
==201374== by 0x1F08822E: TGraph::Fit(TF1*, char const*, char const*, double, double) (TGraph.cxx:1211)
==201374== by 0x26655461: mu2eCrv::MakeCrvRecoPulses::SetWaveform(std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, float, float, float, float) (MakeCrvRecoPulses.cc:271)
==201374== by 0x26ECF0D9: mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)
==201374== by 0x81F3E77: art::EDProducer::produceWithFrame(art::Event&, art::ProcessingFrame const&) (EDProducer.cc:90)
==201374== by 0x826BE21: art::detail::Producer::doEvent(art::EventPrincipal&, art::ModuleContext const&, std::atomic<unsigned long>&, std::atomic<unsigned long>&, std::atomic<unsigned long>&) (Producer.cc:114)
The largest culprit seems to be a TGraph::Fit command in mu2eCrv::MakeCrvRecoPulses::SetWaveform:
However this should be once per job cost ...
Would it be possible to generate the massif
output with the previous version (I.e. using less memory)?
Would it be possible to generate the
massif
output with the previous version (I.e. using less memory)?
Yes, I'll get going on that.
I am not familiar with masssif. If I just look for mu2e:: the most frequent string is:
mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)
What is that telling me?
VSize and RSS both identify the same three modules: CrvRecoPulsses, CaloHitTruthMatch and SelectRecoMC . To flag these an look for VSize or RSS increasing between the pre and post module calls.
I have not looked carefully at all of these but CrvRecoPulses uses all time HEP faviourite anit-pattern. It keeps working space as data members that are filled during the event and cleared at the start of the next event. So that could be a red herring for this issue.
The other two both are heavy users of Assns. Did anything change in the Assn's support between the two art versions?
Would it be possible to generate the
massif
output with the previous version (I.e. using less memory)?
@pcanal, see the the massif printout from the previous version.
I am not familiar with masssif. If I just look for mu2e:: the most frequent string is:
mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)
What is that telling me?
@kutschke, massif
is a valgrind-provided heap-profiling tool that audits the allocations/deallocations of the program. It takes "snapshots" of the program and records the active allocations for each snapshot, along with the functions in which the allocations were made.
One way to detect a memory leak is to identify a function call whose associated allocations over the course of the program keep increasing. For example, if I take a look at the clang::Parser::AnnotateTemplateIdToken
function over the course of the program, the associated memory--the number in parentheses below--keeps getting larger (up to 97MiB by the end of the job).
$ grep "^->.*AnnotateTemplateIdToken" massif-printout-root_v6_28_06b.log | cut -d'(' -f -2
->00.52% (3,809,512B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->00.61% (4,436,824B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->01.91% (14,408,632B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->02.56% (20,002,696B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->03.16% (25,562,488B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->03.70% (29,729,224B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->04.00% (32,651,416B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->04.90% (42,049,000B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->05.89% (54,807,256B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->06.59% (62,361,544B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.16% (81,368,728B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.61% (85,415,288B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.69% (87,751,048B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.74% (89,341,336B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.91% (91,999,096B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.36% (96,319,384B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.36% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.22% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.11% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
This could indicate a problem, or it could be expected. I suspect it is part of the problem, because I don't see it with the older version of ROOT:
$ grep "^->.*AnnotateTemplateIdToken" massif-printout-root_v6_28_04c.log | cut -d'(' -f -2
$
In this case the culprit for the increase in memory is the **BFieldManagerMaker**
. In the old version it takes up to 750MB (rounded up) while in the new version it takes up to 1GB (rounded down). See the traces. Old traces first:
98.06% (746,130,388B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->50.89% (387,240,336B) 0x14C6D4E4: allocate (new_allocator.h:147)
| ->50.89% (387,240,336B) 0x14C6D4E4: allocate (allocator.h:198)
| ->50.89% (387,240,336B) 0x14C6D4E4: allocate (alloc_traits.h:482)
| ->50.89% (387,240,336B) 0x14C6D4E4: _M_allocate (stl_vector.h:378)
| ->50.89% (387,240,336B) 0x14C6D4E4: _M_create_storage (stl_vector.h:395)
| ->50.89% (387,240,336B) 0x14C6D4E4: _Vector_base (stl_vector.h:332)
| ->50.89% (387,240,336B) 0x14C6D4E4: vector (stl_vector.h:566)
| ->50.89% (387,240,336B) 0x14C6D4E4: Container3D (Container3D.hh:35)
| ->50.89% (387,240,336B) 0x14C6D4E4: BFGridMap (BFGridMap.hh:65)
| ->50.89% (387,240,336B) 0x14C6D4E4: _Construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (stl_construct.h:119)
| ->50.89% (387,240,336B) 0x14C6D4E4: construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (alloc_traits.h:660)
| ->50.89% (387,240,336B) 0x14C6D4E4: _Sp_counted_ptr_inplace<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:604)
| ->50.89% (387,240,336B) 0x14C6D4E4: __shared_count<mu2e::BFGridMap, std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:971)
| ->50.89% (387,240,336B) 0x14C6D4E4: __shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:1712)
| ->50.89% (387,240,336B) 0x14C6D4E4: shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:464)
| ->50.89% (387,240,336B) 0x14C6D4E4: make_shared<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:1010)
| ->50.89% (387,240,336B) 0x14C6D4E4: mu2e::BFieldManagerMaker::loadG4BL(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:394)
| ->50.89% (387,240,336B) 0x14C6DC00: mu2e::BFieldManagerMaker::loadMaps(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mu2e::BFMapType, std::allocator<mu2e::BFMapType> >, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:354)
| ->46.93% (357,116,880B) 0x14C6E446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)
vs
97.63% (1,034,002,728B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->36.56% (387,240,336B) 0x14C854E4: allocate (new_allocator.h:147)
| ->36.56% (387,240,336B) 0x14C854E4: allocate (allocator.h:198)
| ->36.56% (387,240,336B) 0x14C854E4: allocate (alloc_traits.h:482)
| ->36.56% (387,240,336B) 0x14C854E4: _M_allocate (stl_vector.h:378)
| ->36.56% (387,240,336B) 0x14C854E4: _M_create_storage (stl_vector.h:395)
| ->36.56% (387,240,336B) 0x14C854E4: _Vector_base (stl_vector.h:332)
| ->36.56% (387,240,336B) 0x14C854E4: vector (stl_vector.h:566)
| ->36.56% (387,240,336B) 0x14C854E4: Container3D (Container3D.hh:35)
| ->36.56% (387,240,336B) 0x14C854E4: BFGridMap (BFGridMap.hh:65)
| ->36.56% (387,240,336B) 0x14C854E4: _Construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&,
int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (stl_construct.h:119)
| ->36.56% (387,240,336B) 0x14C854E4: construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&
, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (alloc_traits.h:660)
| ->36.56% (387,240,336B) 0x14C854E4: _Sp_counted_ptr_inplace<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&,
int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:604)
| ->36.56% (387,240,336B) 0x14C854E4: __shared_count<mu2e::BFGridMap, std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&,
double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:971)
| ->36.56% (387,240,336B) 0x14C854E4: __shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, do
uble&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:1712)
| ->36.56% (387,240,336B) 0x14C854E4: shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, do
uble&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:464)
| ->36.56% (387,240,336B) 0x14C854E4: make_shared<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, doub
le&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:1010)
| ->36.56% (387,240,336B) 0x14C854E4: mu2e::BFieldManagerMaker::loadG4BL(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:394)
| ->36.56% (387,240,336B) 0x14C85C00: mu2e::BFieldManagerMaker::loadMaps(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mu2e::BFMapType, std::allocator<mu2e::BFMapType> >, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:354)
| ->33.72% (357,116,880B) 0x14C86446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)
/Thanks Kyle. It's comforting that you have a lead.
Here is the exercise that I did. I ran old and new with the memory tracker turned on. Then I looked for modules that increased either VSize or RSS between premodule and postmodule. I tabulated the frequency and of the increases and I added up the increases for each module. I skipped the first 50 events so to skip the startup activity. The tables are below:
RSS Table
Module New Old
# MB # MB
CaloHitTruthMatch 193 1577.58 6 61.71
CrvRecoPulses 435 244.94 1 0.54
SelectRecoMC 62 151.36 3 11.60
Output 21 17.41 7 5.26
CrvCoincidenceClusterMatchMC 9 12.16 2 4.31
compressRecoMCs 4 2.97 3 1.30
CrvCoincidenceClusterFinder 3 1.35 1 0.54
FlagBkgHits 1 0.54 1 0.27
makeSTH 1 0.27 1 0.27
KKDeP 1 0.27
TriggerResults 1 0.26
CaloRecoDigiMaker 1 0.26
makePH 1 0.25 1 0.52
CaloClusterTruthMatch 1 0.18
------------------------------------------------------
Total: 302 2008.85 21 87.30
VSize Table:
Module New Old
# MB # MB
CaloHitTruthMatch 189 1556.13 6 62.00
CrvRecoPulses 19 322.02 1 0.69
SelectRecoMC 61 151.56 3 11.74
Output 14 14.95 4 4.30
CrvCoincidenceClusterMatchMC 8 12.16 1 3.44
compressRecoMCs 5 4.11 2 0.97
CrvCoincidenceClusterFinder 3 1.70 1 0.42
FlagBkgHits 1 0.91 1 0.72
makePH 1 0.85 1 0.85
makeSTH 1 0.36 1 0.36
---------------------------------------------------
Total 302 2008.85 21 87.30
@pcanal Thanks for pointing out BFieldManagerMaker. That is called once at the start of the job so it it not the source of the size increasing as events are processed. But it is a surprise that anything changed here.
In this case the culprit for the increase in memory is the BFieldManagerMaker.
@pcanal that's not correct. Of the two traces you show:
98.06% (746,130,388B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
...
| ->46.93% (357,116,880B) 0x14C6E446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)
vs
97.63% (1,034,002,728B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
...
| ->33.72% (357,116,880B) 0x14C86446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)
the contributions from mu2e::BFieldManagerMaker::BFieldManagerMaker
are identical (357,116,880B). And it wouldn't account for the memory increase we see from clang::Parser::AnnotateTemplateIdToken
mentioned above.
@kutschke, incidentally, Mu2e could be benefit from fixing some of its own memory leaks. See https://github.com/Mu2e/Offline/pull/1153
Thanks @knoepfel . Are those diffs from your working area in which you have fixed these along the way to finding the main problem?
@kutschke, yep. I'll create a PR.
@knoepfel Please create the PR. Thanks.
I got side-tracked by the indentation of the log. Kyle is correct that the increase in memory size might be due to:
==201374== by 0xF2A57896: cling::LookupHelper::findScope(llvm::StringRef, cling::LookupHelper::DiagSetting, clang::Type const**, bool) const (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374== by 0xF29824CD: GetClassSharedLibsForModule (TCling.cxx:6957)
==201374== by 0xF29824CD: TCling::GetClassSharedLibs(char const*) (TCling.cxx:7061)
However this code is not per se new. Is the old build of ROOT and the new build of ROOT different in regard to enabling runtime_cxxmodules
?
Either way, it would be interesting to see what is the parameters for findScope
coming from the call site TCling.cxx:6957. . (however it is likely there is too many of them)
Here is the difference in root-config --features
:
$ sdiff -w50 ~/root-config-features-v6-28-04c.log ~/root-config-features-v6-28-06b.log
cxx20 cxx20
asimage asimage
builtin_clang builtin_clang
builtin_cling builtin_cling
builtin_llvm builtin_llvm
builtin_nlohmannjson <
builtin_openui5 builtin_openui5
clad clad
dataframe dataframe
exceptions exceptions
fftw3 fftw3
gdml gdml
gsl_shared <
http http
imt imt
mathmore mathmore
mlp mlp
minuit2 minuit2
opengl opengl
pgsql pgsql
pyroot pyroot
pythia6 pythia6
roofit roofit
webgui webgui
root7 root7
runtime_cxxmodules runtime_cxxmodules
shared shared
soversion soversion
ssl ssl
tmva tmva
tmva-cpu tmva-cpu
> tmva-sofie
tmva-pymva tmva-pymva
spectrum spectrum
x11 x11
xml xml
xrootd xrootd
So in this regards it is the same. We would need to track down which input to findScope
is triggering the parsing.
@pcanal, see:
The second file is actually easier to read, as the cls
parameter is passed directly to findScope
.
These are the printouts when running 1 events.
And here are the values of cls
when processing 1 event (left) and 10 events (right), along with their occurences:
304 tuple<char const*> | 12064 tuple<char const*>
156 tuple<const char*> | 6036 tuple<const char*>
152 char const | 6032 char const
126 void 126 void
50 2 50 2
40 void(TBranch::*)(TBuffer&) 40 void(TBranch::*)(TBuffer&)
30 6 30 6
24 void(TBranch 24 void(TBranch
20 TString::Rep_t 20 TString::Rep_t
20 array<float,2> 20 array<float,2>
12 8 12 8
12 70 12 70
9 vpClingValue 9 vpClingValue
6 5 6 5
6 20 6 20
4 unsigned short 4 unsigned short
4 tuple<const char*,unsigned int> 4 tuple<const char*,unsigned int>
4 ROOT::Internal::TBranchCacheInfo::EStates 4 ROOT::Internal::TBranchCacheInfo::EStates
4 ROOT::Internal::TBranchCacheInfo 4 ROOT::Internal::TBranchCacheInfo
4 ROOT::Experimental::Internal::TBulkBranchRead 4 ROOT::Experimental::Internal::TBulkBranchRead
4 array<unsigned short,8> 4 array<unsigned short,8>
4 array<unsigned short,70> 4 array<unsigned short,70>
4 array<unsigned long,2> 4 array<unsigned long,2>
4 array<double,2> 4 array<double,2>
3 signed char 3 signed char
3 map<unsigned short,unsigned short> 3 map<unsigned short,unsigned short>
1 unsigned long | 2 unsigned long
2 tuple 2 tuple
2 std::string: 2 std::string:
2 std::string 2 std::string
2 mu2e::PDGCodeDetail::enum_type 2 mu2e::PDGCodeDetail::enum_type
1 bool | 2 bool
2 art::Ptr 2 art::Ptr
2 art::Hash 2 art::Hash
1 _Vector_base<string,allocator<string> > 1 _Vector_base<string,allocator<string> >
1 _Vector_base<int,allocator<int> > 1 _Vector_base<int,allocator<int> >
1 ret 1 ret
1 _Rb_tree<unsigned short,pair<const unsigned short,uns 1 _Rb_tree<unsigned short,pair<const unsigned short,uns
1 __pair_base<unsigned short,unsigned short> 1 __pair_base<unsigned short,unsigned short>
1 __pair_base<cet::map_vector_key,mu2e::SimParticle> 1 __pair_base<cet::map_vector_key,mu2e::SimParticle>
1 __pair_base<art::RefCore,unsigned long> 1 __pair_base<art::RefCore,unsigned long>
1 __pair_base<art::Ptr<mu2e::SimParticle>,mu2e::MCTraje 1 __pair_base<art::Ptr<mu2e::SimParticle>,mu2e::MCTraje
1 __pair_base<art::ProductID,set<art::ProductID> > 1 __pair_base<art::ProductID,set<art::ProductID> >
1 __pair_base<art::Hash<2>,art::ProcessHistory> 1 __pair_base<art::Hash<2>,art::ProcessHistory>
1 __pair_base<art::BranchKey,art::BranchDescription> 1 __pair_base<art::BranchKey,art::BranchDescription>
1 P090_RMinimizer 1 P090_RMinimizer
1 P080_GeneticMinimizer 1 P080_GeneticMinimizer
1 P070_TFumiliMinimizer 1 P070_TFumiliMinimizer
1 P060_TLinearMinimizer 1 P060_TLinearMinimizer
1 P050_GSLSimAnMinimizer 1 P050_GSLSimAnMinimizer
1 P040_GSLNLSMinimizer 1 P040_GSLNLSMinimizer
1 P030_GSLMinimizer 1 P030_GSLMinimizer
1 P020_TMinuitMinimizer 1 P020_TMinuitMinimizer
1 P010_Minuit2Minimizer 1 P010_Minuit2Minimizer
1 obj 1 obj
1 nargs 1 nargs
1 mu2e::VirtualDetectorId::enum_type 1 mu2e::VirtualDetectorId::enum_type
1 mu2e::SurfaceIdDetail::enum_type 1 mu2e::SurfaceIdDetail::enum_type
1 mu2e::StrawIdMask::Level 1 mu2e::StrawIdMask::Level
1 mu2e::Helicity::helicity 1 mu2e::Helicity::helicity
1 mu2e::ComboHitCollection::Sort 1 mu2e::ComboHitCollection::Sort
1 __cling_Un1Qu319 1 __cling_Un1Qu319
1 __cling_Un1Qu318 1 __cling_Un1Qu318
1 __cling_Un1Qu317 1 __cling_Un1Qu317
1 __cling_Un1Qu316 1 __cling_Un1Qu316
1 __cling_Un1Qu315 1 __cling_Un1Qu315
1 __cling_Un1Qu314 1 __cling_Un1Qu314
1 __cling_Un1Qu313 1 __cling_Un1Qu313
1 __cling_Un1Qu312 1 __cling_Un1Qu312
1 __cling_Un1Qu311 1 __cling_Un1Qu311
1 __cf_0 1 __cf_0
1 args 1 args
@pcanal, see updated comment above re. occurrences of cls
--tuple<const char*>
and const char
(non-linearly?) increase with the number of events.
The problem is understood and I am investigating a solution. (issue stems from the fact that lookup of templated classes have a memory cost and that TClass::GetClass
will do a lookup each time until it can find a dictionary for the class. In the case of std::tuple
we do not provide a dictionary on purpose, hence the leak per se).
The problem is understood and I am investigating a solution. (issue stems from the fact that lookup of templated classes have a memory cost and that
TClass::GetClass
will do a lookup each time until it can find a dictionary for the class. In the case ofstd::tuple
we do not provide a dictionary on purpose, hence the leak per se).
Thanks Philippe. That's great news.
This is now followed at https://github.com/root-project/root/issues/14199 and solved by https://github.com/root-project/root/pull/14200
Thanks Philippe. We are looking forward to using this fix.
@kutschke, has Mu2e had the opportunity to look at the art 3.14.03 mu distribution?
https://scisoft.fnal.gov/scisoft/bundles/mu/v3_14_03/mu-v3_14_03.html
It includes a version of ROOT with the fixes Philippe implemented.
Hi Kyle,
It passes all of our tests. In particularly we do not see a memory leak.
The PR has been in since before the break but we are waiting for the TDAQ people to return from holiday before we merge. We don’t want to paint them into a corner.
Rob
Thanks, Rob. We'll close the issue then.
Describe the bug Mu2e is validating art suite v3_14_01. One of the jobs in our nightly validation, reco.fcl, typically runs with a VmPeak around 1150. They are submitted with a max memory request of 1980 MB . Followjng the upgrade to v3_14_01, these jobs were killed due to excessive memory use. All other jobs run with a roughly unchanged memory footprint.
When I run interactively, VmPeak grows with the number of events. Right now we suspect that a change in art, or how we interact with art, is triggering an memory leak.
To Reproduce On a machine that sees Mu2e cvmfs and pnfs, in a clean directory and clean shell source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh git clone https://github.com/Mu2e/Offline git clone https://github.com/Mu2e/Production cd Offline git checkout -b leak c5e29819800 cd .. muse setup muse build -j nn # where nn is appropriate for your build machine mu2e -c /exp/mu2e/app/users/kutschke/Development/p054/leak.fcl -n 1>& log.1
The table below gives the value of VmPeak for different values of the argument of -n , when run on mu2ebuild01 N VmPeak (MB) 1 3930.31
50 4016.58 100 4049.98 250 4153.95 2764 6006.51
If you repeat the exercise with the previous commit, e3e621dbe964b, you will see that the difference in VmPeak between 1 event and all events is around 100 MB.
If this is a high-priority issue High Priority. I don't have permission to add labels.