cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.3k forks source link

Safe and efficient use of tensorflow in reco #35541

Open jpata opened 3 years ago

jpata commented 3 years ago

I'm checking the various producers to understand if the tensorflow::GraphDef and tensorflow::Session are used correctly, efficiently and coherently across reco modules:

Questions:

  1. Egamma PFID, DeepCore, DeepMET: would it be more efficient to put the Session as well in the cache object that is shared across threads? I believe it's related to https://github.com/cms-sw/cmssw/issues/32529
  2. DeepCore, DeepMET: do we have a memory leak when the EDProducer is deleted but the Session is not closed? Does it matter? Is std::atomic<tensorflow::GraphDef*> in the global cache object actually required?
jpata commented 3 years ago

assign reconstruction

cmsbuild commented 3 years ago

New categories assigned: reconstruction

@slava77,@jpata you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 3 years ago

A new Issue was created by @jpata Joosep Pata.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

slava77 commented 3 years ago

there is also a variant with edm::global module as in #35237

jpata commented 3 years ago

@makortel @Dr15Jones

could you help us understand if std::atomic is actually required here: https://github.com/cms-sw/cmssw/blob/6d2f66057131baacc2fcbdd203588c41c885b42c/RecoMET/METPUSubtraction/plugins/DeepMETProducer.cc#L12 ?

makortel commented 3 years ago

I don't see any reason why that should be std::atomic. The graph_def is written to exactly once (initializeGlobalCache()), then read once per stream (in edm::stream module constructor, that are guaranteed to be called serially), and finally destructed once (globalEndJob()).

The variable could easily be std::unique_ptr<tensorflow::GraphDef>, and given that tensorflow::createSession() takes const GraphDef*, that could even be edm::propagate_const<std::unique_ptr<tensorflow::GraphDef>> (i.e. the DeepMETProducer would see "constant pointer to constant GraphDef").

Dr15Jones commented 3 years ago

If the GraphDef doesn't need to be changed after it is created, then just a std::unique_ptr<const tensorflow::GraphDef> would work.