cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

Concurrent Lumis and DQM at HLT #28341

Open schneiml opened 4 years ago

schneiml commented 4 years ago

Der HLT and Core Experts,

From recent discussions I hear that HLT is planning to move to concurrent lumisections by Run3. This opens new questions regarding DQM@HLT:

Now, there are many ways to handle that in the future:

I'm looking forward to any additional opinions on this aspect; it might be fine to not do anything, but we should be aware of the implications.

[1] https://cmssdt.cern.ch/dxr/CMSSW/source/DQMServices/Components/plugins/DQMFileSaver.h?from=DQMFileSaver#15

[2] Note that for online DQM we have the same problem, twice: We have modules (e.g. specific edm::one users) that can't save per-lumi, but need their histograms shown live in online DQM, and we have modules that can save per-lumi, but online DQM traditionally can update much faster than per-lumisection. So the edm::Service or similar is unavoidable here, but also less of a problem, since online DQM can happily run single-threaded, no concurrent lumis.

cmsbuild commented 4 years ago

A new Issue was created by @schneiml Marcel Schneider.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 4 years ago

assign hlt, core

cmsbuild commented 4 years ago

New categories assigned: core,hlt

@Dr15Jones,@smuzaffar,@Martin-Grunewald,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 3 years ago

Is this issue still relevant? @cms-sw/dqm-l2

(I'd guess "yes")

jfernan2 commented 3 years ago

My understanding is that yes, it is still relevant. IMHO, the main issue is the need to keep histograms shown live in online DQM. A single thread at Online DQM is still used, but I am not sure of the plan for HLT. Thanks

fwyzard commented 3 years ago

I guess the two options are

What I recall is that the only way to get per-Run histograms from the HLT to the DQM was to "slice" them into per-LS updates; if that is not needed/supported any more, or if there is a better way to do it, we should definitely consider the alternative.

About the actual DQM modules running at HLT: I don't know what has change in DQM-land since 2019; maybe some of the original issues have already been addressed ?

Out of my head, the requirements from the HLT side are

Not sure what are the best means to achieve them.

jfernan2 commented 3 years ago

I apologize in advance if my answer does not reach the desired levels for this conversation, but unfortunately the real expert on this matter @schneiml is out of CMS since a long time as you probably know.

I am afraid that the original problems stated in this issue concerning DQM@HLT have not been addressed. According to the description by Marcel and the PRs that came afterwards, specially https://github.com/cms-sw/cmssw/pull/28622 where DQMGlobalEDAnalyzer based on edm::global::EDProducer was modified, it is used for DQM@HLT and a few random other things, and cannot save per-lumi histograms (this is a conflict with the fact that HLT typically saves only per lumi histograms)

What seems to have changed since this issue was raised, is the change of DQMEDAnalyzer, so that now it is based on edm::stream since https://github.com/cms-sw/cmssw/pull/28813

And this links with the last solution proposed by Marcel at the description of this issue, which may be the best option, migrate the DQMGlobalEDAnalyzer to DQMEDAnalyzer, at least for the HLT purpouses.

But I'd like to know first the opinion of the real experts who are still alive, @makortel @Dr15Jones Thanks

makortel commented 3 years ago

I would not trust much our memory, but what I recall and see in the code, migrating DQMGlobalEDAnalyzer to DQMEDAnalyzer could be the least-effort way forward (and would fulfill the requirements to support multiple threads, streams, and concurrent lumisections, and to have minimal copies of histograms in memory).

An alternative could be to add per LS histogram support to DQMGlobalEDAnalyzer. But given the way DQMStore currently appears to assume a module instance processing all events of one LS at a time for per-LS-histograms, such development would probably be quite involved (although I still believe it would be possible to craft a design that would allow that and to e.g. avoid locks in MonitorElement, but that probably goes beyond the current dicussion).

I assume the latter two points of @fwyzard (completeness of merging/accumulating from all HLT nodes, and older histograms not being overwritten by newer histograms) are more for the histogram merging infrastructure outside of CMSSW.