art-framework-suite / art

The implementation of the art physics event processing framework
Other
2 stars 7 forks source link

Suspect a memory leak triggered by the upgrade to art suite v3_14_01 #142

Closed kutschke closed 8 months ago

kutschke commented 9 months ago

Describe the bug Mu2e is validating art suite v3_14_01. One of the jobs in our nightly validation, reco.fcl, typically runs with a VmPeak around 1150. They are submitted with a max memory request of 1980 MB . Followjng the upgrade to v3_14_01, these jobs were killed due to excessive memory use. All other jobs run with a roughly unchanged memory footprint.

When I run interactively, VmPeak grows with the number of events. Right now we suspect that a change in art, or how we interact with art, is triggering an memory leak.

To Reproduce On a machine that sees Mu2e cvmfs and pnfs, in a clean directory and clean shell source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh git clone https://github.com/Mu2e/Offline git clone https://github.com/Mu2e/Production cd Offline git checkout -b leak c5e29819800 cd .. muse setup muse build -j nn # where nn is appropriate for your build machine mu2e -c /exp/mu2e/app/users/kutschke/Development/p054/leak.fcl -n 1>& log.1

The table below gives the value of VmPeak for different values of the argument of -n , when run on mu2ebuild01 N VmPeak (MB) 1 3930.31
50 4016.58 100 4049.98 250 4153.95 2764 6006.51

If you repeat the exercise with the previous commit, e3e621dbe964b, you will see that the difference in VmPeak between 1 event and all events is around 100 MB.

If this is a high-priority issue High Priority. I don't have permission to add labels.

kutschke commented 9 months ago

A small correction. Ray reports that our extracted position reconstruction jobs also show extreme memory growth. Jobs like ceSimReco, cosmicSimReco, potSim do not.

knoepfel commented 9 months ago

Thanks, @kutschke. Running valgrind now.

knoepfel commented 9 months ago

@kutschke, see this valgrind report which was produced from running 10 events. I'm not seeing anything that directly implicates art. I see some memory leaks from Mu2e and some sizable leaks from ROOT, but I'll keep on it and let you know what I find out.

greenc-FNAL commented 9 months ago

This is with root v6_28_06b, right?

knoepfel commented 9 months ago

This is with root v6_28_06b, right?

Yes.

greenc-FNAL commented 9 months ago

@knoepfel Any chance you could try to duplicate the leak in a local build with root v6_28_10?

greenc-FNAL commented 9 months ago

Or even v6_30_02?

kutschke commented 9 months ago

Thanks. I asked you to look into it becuase the issue is triggered when we build the same code against two different versions of the art suite. I understand that it could be an issue on our side that was latent until now - but I appreciate all the help I can get.

As far as we are concerned, we are happy to move to a newer root if that resolves the problem.

knoepfel commented 9 months ago

Understood, @kutschke. @greenc-FNAL, I'd have to be able to reproduce the leak outside of Mu2e to test a different version of ROOT (unless I rebuild some of the other products that Mu2e depends upon). I tried doing that yesterday with a simple art job and was unsuccessful.

In the meantime:

  1. I will fix some of the Mu2e leaks and then rerun valgrind with a larger value for --num-callers, which may end up exposing problems in art that are just not shown with the default --num-callers=12 setting.
  2. I'll ask Philippe if he's aware of any memory leaks that have been resolved lately.
kutschke commented 9 months ago

Can you remind me. In the memory tracker db, is Vsize the peak virtual size at the time it is interogated or the current virtual size. I think it's peak.

knoepfel commented 9 months ago

@kutschke, Vsize in the (e.g.) EventInfo or ModuleInfo table is the current virtual size. The peak size (VmPeak) is only recorded at the end, and it's shown in the PeakUsage table. Incidentally, the RSS is the more relevant metric, as it is the actual memory used by the program.

I believe I am seeing the same behavior in some art tests. I will let you know.

knoepfel commented 9 months ago

Just ran massif on the mu2e job with 500 events. See the massif printout here.

knoepfel commented 9 months ago

Pinging @pcanal to see if he has guidance.

knoepfel commented 9 months ago

The largest culprit seems to be a TGraph::Fit command in mu2eCrv::MakeCrvRecoPulses::SetWaveform:

==201374== 672,560 bytes in 6,005 blocks are definitely lost in loss record 29,240 of 29,249
==201374==    at 0x4C2B15C: malloc (vg_replace_malloc.c:431)
==201374==    by 0xF344E725: clang::Parser::AnnotateTemplateIdToken(clang::OpaquePtr<clang::TemplateName>, clang::TemplateNameKind, clang::CXXScopeSpec&, clang::SourceLocation, clang::UnqualifiedId&, bool, bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374==    by 0xF33E28E0: clang::Parser::ParseOptionalCXXScopeSpecifier(clang::CXXScopeSpec&, clang::OpaquePtr<clang::QualType>, bool, bool, bool*, bool, clang::IdentifierInfo**, bool, bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374==    by 0xF345DBD2: clang::Parser::TryAnnotateCXXScopeToken(bool) (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374==    by 0xF2A57896: cling::LookupHelper::findScope(llvm::StringRef, cling::LookupHelper::DiagSetting, clang::Type const**, bool) const (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374==    by 0xF29824CD: GetClassSharedLibsForModule (TCling.cxx:6957)
==201374==    by 0xF29824CD: TCling::GetClassSharedLibs(char const*) (TCling.cxx:7061)
==201374==    by 0xF2965186: TCling::ShallowAutoLoadImpl(char const*) (TCling.cxx:6119)
==201374==    by 0xF2983220: TCling::DeepAutoLoadImpl(char const*, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, bool) (TCling.cxx:6172)
==201374==    by 0xF2983743: TCling::AutoLoad(char const*, bool) (TCling.cxx:6284)
==201374==    by 0xF29838A5: TCling::AutoLoad(std::type_info const&, bool) (TCling.cxx:6103)
==201374==    by 0x14EB2D47: TClass::GetClass(std::type_info const&, bool, bool, unsigned long, unsigned long) (TClass.cxx:3316)
==201374==    by 0x139AA957: GetClass (TClass.h:628)
==201374==    by 0x139AA957: GetClassHelper<std::tuple<char const*> > (TClass.h:644)
==201374==    by 0x139AA957: GetClass<std::tuple<char const*> > (TClass.h:654)
==201374==    by 0x139AA957: long TPluginHandler::ExecPluginImpl<char const*>(char const* const&) (TPluginManager.h:175)
==201374==    by 0x139A9B5C: ExecPlugin<char const*> (TPluginManager.h:218)
==201374==    by 0x139A9B5C: ROOT::Math::Factory::CreateMinimizer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Factory.cxx:100)
==201374==    by 0x139AC9E0: ROOT::Fit::FitConfig::CreateMinimizer() (FitConfig.cxx:188)
==201374==    by 0x139C4DC8: ROOT::Fit::Fitter::DoInitMinimizer() (Fitter.cxx:736)
==201374==    by 0x139CA8E4: bool ROOT::Fit::Fitter::DoMinimization<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> > >(std::unique_ptr<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> >, std::default_delete<ROOT::Fit::Chi2FCN<ROOT::Math::IBaseFunctionMultiDimTempl<double>, ROOT::Math::IParametricFunctionMultiDimTempl<double> > > >, ROOT::Math::IBaseFunctionMultiDimTempl<double> const*) (Fitter.cxx:867)
==201374==    by 0x139C72DB: ROOT::Fit::Fitter::DoLeastSquareFit(ROOT::EExecutionPolicy const&) (Fitter.cxx:362)
==201374==    by 0x1EFE5F11: LeastSquareFit (Fitter.h:181)
==201374==    by 0x1EFE5F11: Fit (Fitter.h:166)
==201374==    by 0x1EFE5F11: TFitResultPtr HFit::Fit<TGraph>(TGraph*, TF1*, Foption_t&, ROOT::Math::MinimizerOptions const&, char const*, ROOT::Fit::DataRange&) (HFitImpl.cxx:371)
==201374==    by 0x1EFDB3F6: ROOT::Fit::FitObject(TGraph*, TF1*, Foption_t&, ROOT::Math::MinimizerOptions const&, char const*, ROOT::Fit::DataRange&) (HFitImpl.cxx:981)
==201374==    by 0x1F08822E: TGraph::Fit(TF1*, char const*, char const*, double, double) (TGraph.cxx:1211)
==201374==    by 0x26655461: mu2eCrv::MakeCrvRecoPulses::SetWaveform(std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, float, float, float, float) (MakeCrvRecoPulses.cc:271)
==201374==    by 0x26ECF0D9: mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)
==201374==    by 0x81F3E77: art::EDProducer::produceWithFrame(art::Event&, art::ProcessingFrame const&) (EDProducer.cc:90)
==201374==    by 0x826BE21: art::detail::Producer::doEvent(art::EventPrincipal&, art::ModuleContext const&, std::atomic<unsigned long>&, std::atomic<unsigned long>&, std::atomic<unsigned long>&) (Producer.cc:114)
pcanal commented 9 months ago

The largest culprit seems to be a TGraph::Fit command in mu2eCrv::MakeCrvRecoPulses::SetWaveform:

However this should be once per job cost ...

pcanal commented 9 months ago

Would it be possible to generate the massif output with the previous version (I.e. using less memory)?

knoepfel commented 9 months ago

Would it be possible to generate the massif output with the previous version (I.e. using less memory)?

Yes, I'll get going on that.

kutschke commented 9 months ago

I am not familiar with masssif. If I just look for mu2e:: the most frequent string is:

mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)

What is that telling me?

kutschke commented 9 months ago

VSize and RSS both identify the same three modules: CrvRecoPulsses, CaloHitTruthMatch and SelectRecoMC . To flag these an look for VSize or RSS increasing between the pre and post module calls.

I have not looked carefully at all of these but CrvRecoPulses uses all time HEP faviourite anit-pattern. It keeps working space as data members that are filled during the event and cleared at the start of the next event. So that could be a red herring for this issue.

The other two both are heavy users of Assns. Did anything change in the Assn's support between the two art versions?

knoepfel commented 9 months ago

Would it be possible to generate the massif output with the previous version (I.e. using less memory)?

@pcanal, see the the massif printout from the previous version.

knoepfel commented 9 months ago

I am not familiar with masssif. If I just look for mu2e:: the most frequent string is:

mu2e::CrvRecoPulsesFinder::produce(art::Event&) (CrvRecoPulsesFinder_module.cc:157)

What is that telling me?

@kutschke, massif is a valgrind-provided heap-profiling tool that audits the allocations/deallocations of the program. It takes "snapshots" of the program and records the active allocations for each snapshot, along with the functions in which the allocations were made.

One way to detect a memory leak is to identify a function call whose associated allocations over the course of the program keep increasing. For example, if I take a look at the clang::Parser::AnnotateTemplateIdToken function over the course of the program, the associated memory--the number in parentheses below--keeps getting larger (up to 97MiB by the end of the job).

$ grep "^->.*AnnotateTemplateIdToken" massif-printout-root_v6_28_06b.log | cut -d'(' -f -2
->00.52% (3,809,512B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->00.61% (4,436,824B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->01.91% (14,408,632B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->02.56% (20,002,696B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->03.16% (25,562,488B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->03.70% (29,729,224B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->04.00% (32,651,416B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->04.90% (42,049,000B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->05.89% (54,807,256B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->06.59% (62,361,544B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.16% (81,368,728B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.61% (85,415,288B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.69% (87,751,048B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.74% (89,341,336B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->08.91% (91,999,096B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.36% (96,319,384B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.36% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.22% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken
->09.11% (97,695,640B) 0xF23EE725: clang::Parser::AnnotateTemplateIdToken

This could indicate a problem, or it could be expected. I suspect it is part of the problem, because I don't see it with the older version of ROOT:

$ grep "^->.*AnnotateTemplateIdToken" massif-printout-root_v6_28_04c.log | cut -d'(' -f -2
$ 
pcanal commented 9 months ago

In this case the culprit for the increase in memory is the **BFieldManagerMaker**. In the old version it takes up to 750MB (rounded up) while in the new version it takes up to 1GB (rounded down). See the traces. Old traces first:

98.06% (746,130,388B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->50.89% (387,240,336B) 0x14C6D4E4: allocate (new_allocator.h:147)
| ->50.89% (387,240,336B) 0x14C6D4E4: allocate (allocator.h:198)
|   ->50.89% (387,240,336B) 0x14C6D4E4: allocate (alloc_traits.h:482)
|     ->50.89% (387,240,336B) 0x14C6D4E4: _M_allocate (stl_vector.h:378)
|       ->50.89% (387,240,336B) 0x14C6D4E4: _M_create_storage (stl_vector.h:395)
|         ->50.89% (387,240,336B) 0x14C6D4E4: _Vector_base (stl_vector.h:332)
|           ->50.89% (387,240,336B) 0x14C6D4E4: vector (stl_vector.h:566)
|             ->50.89% (387,240,336B) 0x14C6D4E4: Container3D (Container3D.hh:35)
|               ->50.89% (387,240,336B) 0x14C6D4E4: BFGridMap (BFGridMap.hh:65)
|                 ->50.89% (387,240,336B) 0x14C6D4E4: _Construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (stl_construct.h:119)
|                   ->50.89% (387,240,336B) 0x14C6D4E4: construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (alloc_traits.h:660)
|                     ->50.89% (387,240,336B) 0x14C6D4E4: _Sp_counted_ptr_inplace<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:604)
|                       ->50.89% (387,240,336B) 0x14C6D4E4: __shared_count<mu2e::BFGridMap, std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:971)
|                         ->50.89% (387,240,336B) 0x14C6D4E4: __shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:1712)
|                           ->50.89% (387,240,336B) 0x14C6D4E4: shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:464)
|                             ->50.89% (387,240,336B) 0x14C6D4E4: make_shared<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:1010)
|                               ->50.89% (387,240,336B) 0x14C6D4E4: mu2e::BFieldManagerMaker::loadG4BL(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:394)
|                                 ->50.89% (387,240,336B) 0x14C6DC00: mu2e::BFieldManagerMaker::loadMaps(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mu2e::BFMapType, std::allocator<mu2e::BFMapType> >, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:354)
|                                   ->46.93% (357,116,880B) 0x14C6E446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)

vs

97.63% (1,034,002,728B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->36.56% (387,240,336B) 0x14C854E4: allocate (new_allocator.h:147)
| ->36.56% (387,240,336B) 0x14C854E4: allocate (allocator.h:198)
|   ->36.56% (387,240,336B) 0x14C854E4: allocate (alloc_traits.h:482)
|     ->36.56% (387,240,336B) 0x14C854E4: _M_allocate (stl_vector.h:378)
|       ->36.56% (387,240,336B) 0x14C854E4: _M_create_storage (stl_vector.h:395)
|         ->36.56% (387,240,336B) 0x14C854E4: _Vector_base (stl_vector.h:332)
|           ->36.56% (387,240,336B) 0x14C854E4: vector (stl_vector.h:566)
|             ->36.56% (387,240,336B) 0x14C854E4: Container3D (Container3D.hh:35)
|               ->36.56% (387,240,336B) 0x14C854E4: BFGridMap (BFGridMap.hh:65)
|                 ->36.56% (387,240,336B) 0x14C854E4: _Construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&,
 int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (stl_construct.h:119)
|                   ->36.56% (387,240,336B) 0x14C854E4: construct<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&
, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (alloc_traits.h:660)
|                     ->36.56% (387,240,336B) 0x14C854E4: _Sp_counted_ptr_inplace<const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, double&, double&,
 int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:604)
|                       ->36.56% (387,240,336B) 0x14C854E4: __shared_count<mu2e::BFGridMap, std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, 
double&, int&, double&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:971)
|                         ->36.56% (387,240,336B) 0x14C854E4: __shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, do
uble&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr_base.h:1712)
|                           ->36.56% (387,240,336B) 0x14C854E4: shared_ptr<std::allocator<void>, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, do
uble&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:464)
|                             ->36.56% (387,240,336B) 0x14C854E4: make_shared<mu2e::BFGridMap, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int&, double&, double&, int&, doub
le&, double&, int&, double&, double&, mu2e::BFMapType::enum_type, double, mu2e::EnumToStringSparse<mu2e::BFInterpolationStyleDetail> > (shared_ptr.h:1010)
|                               ->36.56% (387,240,336B) 0x14C854E4: mu2e::BFieldManagerMaker::loadG4BL(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:394)
|                                 ->36.56% (387,240,336B) 0x14C85C00: mu2e::BFieldManagerMaker::loadMaps(std::vector<std::shared_ptr<mu2e::BFMap>, std::allocator<std::shared_ptr<mu2e::BFMap> > >&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<mu2e::BFMapType, std::allocator<mu2e::BFMapType> >, mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:354)
|                                   ->33.72% (357,116,880B) 0x14C86446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)
kutschke commented 9 months ago

/Thanks Kyle. It's comforting that you have a lead.

Here is the exercise that I did. I ran old and new with the memory tracker turned on. Then I looked for modules that increased either VSize or RSS between premodule and postmodule. I tabulated the frequency and of the increases and I added up the increases for each module. I skipped the first 50 events so to skip the startup activity. The tables are below:

RSS Table

Module                           New           Old
                                #     MB       #  MB
CaloHitTruthMatch             193   1577.58    6 61.71
CrvRecoPulses                 435    244.94    1  0.54
SelectRecoMC                   62    151.36    3 11.60
Output                         21     17.41    7  5.26
CrvCoincidenceClusterMatchMC    9     12.16    2  4.31
compressRecoMCs                 4      2.97    3  1.30
CrvCoincidenceClusterFinder     3      1.35    1  0.54
FlagBkgHits                     1      0.54    1  0.27
makeSTH                         1      0.27    1  0.27
KKDeP                                          1  0.27
TriggerResults                                 1  0.26
CaloRecoDigiMaker                              1  0.26
makePH                          1      0.25    1  0.52
CaloClusterTruthMatch                          1  0.18
------------------------------------------------------
Total:                        302   2008.85   21 87.30

VSize Table:

Module                         New          Old
                               #   MB       # MB
CaloHitTruthMatch            189 1556.13    6 62.00
CrvRecoPulses                 19  322.02    1  0.69
SelectRecoMC                  61  151.56    3 11.74
Output                        14   14.95    4  4.30
CrvCoincidenceClusterMatchMC   8   12.16    1  3.44
compressRecoMCs                5    4.11    2  0.97
CrvCoincidenceClusterFinder    3    1.70    1  0.42
FlagBkgHits                    1    0.91    1  0.72
makePH                         1    0.85    1  0.85
makeSTH                        1    0.36    1  0.36
---------------------------------------------------
Total                        302 2008.85   21 87.30
kutschke commented 9 months ago

@pcanal Thanks for pointing out BFieldManagerMaker. That is called once at the start of the job so it it not the source of the size increasing as events are processed. But it is a surprise that anything changed here.

knoepfel commented 9 months ago

In this case the culprit for the increase in memory is the BFieldManagerMaker.

@pcanal that's not correct. Of the two traces you show:

98.06% (746,130,388B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

    ...

|         ->46.93% (357,116,880B) 0x14C6E446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)

vs

97.63% (1,034,002,728B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.

   ...

|         ->33.72% (357,116,880B) 0x14C86446: mu2e::BFieldManagerMaker::BFieldManagerMaker(mu2e::BFieldConfig const&) (BFieldManagerMaker.cc:110)

the contributions from mu2e::BFieldManagerMaker::BFieldManagerMaker are identical (357,116,880B). And it wouldn't account for the memory increase we see from clang::Parser::AnnotateTemplateIdToken mentioned above.

knoepfel commented 9 months ago

@kutschke, incidentally, Mu2e could be benefit from fixing some of its own memory leaks. See https://github.com/Mu2e/Offline/pull/1153

kutschke commented 9 months ago

Thanks @knoepfel . Are those diffs from your working area in which you have fixed these along the way to finding the main problem?

knoepfel commented 9 months ago

@kutschke, yep. I'll create a PR.

kutschke commented 9 months ago

@knoepfel Please create the PR. Thanks.

pcanal commented 9 months ago

I got side-tracked by the indentation of the log. Kyle is correct that the increase in memory size might be due to:

==201374==    by 0xF2A57896: cling::LookupHelper::findScope(llvm::StringRef, cling::LookupHelper::DiagSetting, clang::Type const**, bool) const (in /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_28_06b/Linux64bit+3.10-2.17-e28-p3915-prof/lib/libCling.so.6.28.06)
==201374==    by 0xF29824CD: GetClassSharedLibsForModule (TCling.cxx:6957)
==201374==    by 0xF29824CD: TCling::GetClassSharedLibs(char const*) (TCling.cxx:7061)

However this code is not per se new. Is the old build of ROOT and the new build of ROOT different in regard to enabling runtime_cxxmodules ?

Either way, it would be interesting to see what is the parameters for findScope coming from the call site TCling.cxx:6957. . (however it is likely there is too many of them)

knoepfel commented 9 months ago

Here is the difference in root-config --features:

$ sdiff -w50 ~/root-config-features-v6-28-04c.log ~/root-config-features-v6-28-06b.log 
cxx20                   cxx20
asimage                 asimage
builtin_clang           builtin_clang
builtin_cling           builtin_cling
builtin_llvm            builtin_llvm
builtin_nlohmannjson  <
builtin_openui5         builtin_openui5
clad                    clad
dataframe               dataframe
exceptions              exceptions
fftw3                   fftw3
gdml                    gdml
gsl_shared            <
http                    http
imt                     imt
mathmore                mathmore
mlp                     mlp
minuit2                 minuit2
opengl                  opengl
pgsql                   pgsql
pyroot                  pyroot
pythia6                 pythia6
roofit                  roofit
webgui                  webgui
root7                   root7
runtime_cxxmodules      runtime_cxxmodules
shared                  shared
soversion               soversion
ssl                     ssl
tmva                    tmva
tmva-cpu                tmva-cpu
                      > tmva-sofie
tmva-pymva              tmva-pymva
spectrum                spectrum
x11                     x11
xml                     xml
xrootd                  xrootd
pcanal commented 9 months ago

So in this regards it is the same. We would need to track down which input to findScope is triggering the parsing.

knoepfel commented 9 months ago

@pcanal, see:

  1. cls to findScope
  2. args to GetClassSharedLibsForModule

The second file is actually easier to read, as the cls parameter is passed directly to findScope.

These are the printouts when running 1 events.

knoepfel commented 9 months ago

And here are the values of cls when processing 1 event (left) and 10 events (right), along with their occurences:

    304 tuple<char const*>                    |   12064 tuple<char const*>
    156 tuple<const char*>                    |    6036 tuple<const char*>
    152 char const                        |    6032 char const
    126 void                                126 void
     50 2                                50 2
     40 void(TBranch::*)(TBuffer&)                   40 void(TBranch::*)(TBuffer&)
     30 6                                30 6
     24 void(TBranch                             24 void(TBranch
     20 TString::Rep_t                           20 TString::Rep_t
     20 array<float,2>                           20 array<float,2>
     12 8                                12 8
     12 70                               12 70
      9 vpClingValue                              9 vpClingValue
      6 5                                 6 5
      6 20                                6 20
      4 unsigned short                            4 unsigned short
      4 tuple<const char*,unsigned int>                   4 tuple<const char*,unsigned int>
      4 ROOT::Internal::TBranchCacheInfo::EStates             4 ROOT::Internal::TBranchCacheInfo::EStates
      4 ROOT::Internal::TBranchCacheInfo                  4 ROOT::Internal::TBranchCacheInfo
      4 ROOT::Experimental::Internal::TBulkBranchRead             4 ROOT::Experimental::Internal::TBulkBranchRead
      4 array<unsigned short,8>                       4 array<unsigned short,8>
      4 array<unsigned short,70>                      4 array<unsigned short,70>
      4 array<unsigned long,2>                        4 array<unsigned long,2>
      4 array<double,2>                           4 array<double,2>
      3 signed char                           3 signed char
      3 map<unsigned short,unsigned short>                3 map<unsigned short,unsigned short>
      1 unsigned long                         |       2 unsigned long
      2 tuple                                 2 tuple
      2 std::string:                              2 std::string:
      2 std::string                           2 std::string
      2 mu2e::PDGCodeDetail::enum_type                    2 mu2e::PDGCodeDetail::enum_type
      1 bool                              |       2 bool
      2 art::Ptr                              2 art::Ptr
      2 art::Hash                             2 art::Hash
      1 _Vector_base<string,allocator<string> >               1 _Vector_base<string,allocator<string> >
      1 _Vector_base<int,allocator<int> >                 1 _Vector_base<int,allocator<int> >
      1 ret                               1 ret
      1 _Rb_tree<unsigned short,pair<const unsigned short,uns         1 _Rb_tree<unsigned short,pair<const unsigned short,uns
      1 __pair_base<unsigned short,unsigned short>            1 __pair_base<unsigned short,unsigned short>
      1 __pair_base<cet::map_vector_key,mu2e::SimParticle>        1 __pair_base<cet::map_vector_key,mu2e::SimParticle>
      1 __pair_base<art::RefCore,unsigned long>               1 __pair_base<art::RefCore,unsigned long>
      1 __pair_base<art::Ptr<mu2e::SimParticle>,mu2e::MCTraje         1 __pair_base<art::Ptr<mu2e::SimParticle>,mu2e::MCTraje
      1 __pair_base<art::ProductID,set<art::ProductID> >          1 __pair_base<art::ProductID,set<art::ProductID> >
      1 __pair_base<art::Hash<2>,art::ProcessHistory>             1 __pair_base<art::Hash<2>,art::ProcessHistory>
      1 __pair_base<art::BranchKey,art::BranchDescription>        1 __pair_base<art::BranchKey,art::BranchDescription>
      1 P090_RMinimizer                           1 P090_RMinimizer
      1 P080_GeneticMinimizer                         1 P080_GeneticMinimizer
      1 P070_TFumiliMinimizer                         1 P070_TFumiliMinimizer
      1 P060_TLinearMinimizer                         1 P060_TLinearMinimizer
      1 P050_GSLSimAnMinimizer                        1 P050_GSLSimAnMinimizer
      1 P040_GSLNLSMinimizer                          1 P040_GSLNLSMinimizer
      1 P030_GSLMinimizer                         1 P030_GSLMinimizer
      1 P020_TMinuitMinimizer                         1 P020_TMinuitMinimizer
      1 P010_Minuit2Minimizer                         1 P010_Minuit2Minimizer
      1 obj                               1 obj
      1 nargs                                 1 nargs
      1 mu2e::VirtualDetectorId::enum_type                1 mu2e::VirtualDetectorId::enum_type
      1 mu2e::SurfaceIdDetail::enum_type                  1 mu2e::SurfaceIdDetail::enum_type
      1 mu2e::StrawIdMask::Level                      1 mu2e::StrawIdMask::Level
      1 mu2e::Helicity::helicity                      1 mu2e::Helicity::helicity
      1 mu2e::ComboHitCollection::Sort                    1 mu2e::ComboHitCollection::Sort
      1 __cling_Un1Qu319                          1 __cling_Un1Qu319
      1 __cling_Un1Qu318                          1 __cling_Un1Qu318
      1 __cling_Un1Qu317                          1 __cling_Un1Qu317
      1 __cling_Un1Qu316                          1 __cling_Un1Qu316
      1 __cling_Un1Qu315                          1 __cling_Un1Qu315
      1 __cling_Un1Qu314                          1 __cling_Un1Qu314
      1 __cling_Un1Qu313                          1 __cling_Un1Qu313
      1 __cling_Un1Qu312                          1 __cling_Un1Qu312
      1 __cling_Un1Qu311                          1 __cling_Un1Qu311
      1 __cf_0                                1 __cf_0
      1 args                                  1 args
knoepfel commented 9 months ago

@pcanal, see updated comment above re. occurrences of cls--tuple<const char*> and const char (non-linearly?) increase with the number of events.

pcanal commented 9 months ago

The problem is understood and I am investigating a solution. (issue stems from the fact that lookup of templated classes have a memory cost and that TClass::GetClass will do a lookup each time until it can find a dictionary for the class. In the case of std::tuple we do not provide a dictionary on purpose, hence the leak per se).

kutschke commented 9 months ago

The problem is understood and I am investigating a solution. (issue stems from the fact that lookup of templated classes have a memory cost and that TClass::GetClass will do a lookup each time until it can find a dictionary for the class. In the case of std::tuple we do not provide a dictionary on purpose, hence the leak per se).

Thanks Philippe. That's great news.

pcanal commented 9 months ago

This is now followed at https://github.com/root-project/root/issues/14199 and solved by https://github.com/root-project/root/pull/14200

kutschke commented 9 months ago

Thanks Philippe. We are looking forward to using this fix.

knoepfel commented 8 months ago

@kutschke, has Mu2e had the opportunity to look at the art 3.14.03 mu distribution?

https://scisoft.fnal.gov/scisoft/bundles/mu/v3_14_03/mu-v3_14_03.html

It includes a version of ROOT with the fixes Philippe implemented.

kutschke commented 8 months ago

Hi Kyle,

It passes all of our tests. In particularly we do not see a memory leak.

The PR has been in since before the break but we are waiting for the TDAQ people to return from holiday before we merge. We don’t want to paint them into a corner.

Rob

knoepfel commented 8 months ago

Thanks, Rob. We'll close the issue then.