Open Dr15Jones opened 5 years ago
A new Issue was created by @Dr15Jones Chris Jones.
@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm, core
New categories assigned: dqm,core
@Dr15Jones,@smuzaffar,@jfernan2,@andrius-k,@schneiml,@kmaeshima you have been requested to review this Pull request/Issue and eventually sign? Thanks
ESIntegrityTask
in DQM/EcalPreshowerMonitorModule calls DQMStore::softReset
and DMQStore::disableSoftReset
based on if it wants some of the MonitorElements
to be Lumi based or not (which is chosen by the configuration).
In the regime of concurrently LuminosityBlocks, what behavior should this module have?
DQTask
in DQM/HcalCommon is a base class for many modules in DQM/HcalTasks. Some of these may be creating histograms, while others do not. This may require DQTask
to become a template to handle any caching or lumi based histograms.
As an experiment, I changed DQMEDAnalyzer
to just be one::DQMEDAnalyzer<>
and this broke 71 modules.
See #25114 and #25115
See #25124 , #25125 and #25128
See #25144 #25145 and #25146
See #25161
Hi Chris,
I am not that happy with your proposal since I had a bit different solution in mind. Though I only have an idea in mind and no working code for now, so I can't promise to much.
First, I'd like to switch DQMEDAnalyzer
back to edm::stream
. Since this migration was done in the past, it should be reasonably easy. But we have to do it soon, before to many modules rely on being edm::one
.
Then, use the fact that we have MonitorElement
as a wrapper around histograms, so we can add some smartness there. Even if we still allow access to the bare TH1 to some extent, we can e.g. decide to hand out different TH1
depending on the LS (ideally pointing into a LuminosityBlockCache
or similar, since I'd like the storage decentralized as well).
The reason why I am not a fan of the re-booking solution is that it requires fairly invasive changes to the subsystem code. I prefer to be able to turn a different granularity of saving by configuration, maybe even globally.
Finally, we need some awareness of the EventSetup
and Conditions in DQM, for multi-run harvesting and friends. The booking must be able to depend on the EventSetup
so we can handle geometry/topology dependent booking (upgrades etc.), but also we should make sure the booking can not arbitrarily change every LS (so that we can merge over intervals longer than the run, if possible.
Of course, some technical cleanup in the modules does not hurt. But there I'd like to see a test coverage report to have an idea which modules are tested at all, used in Tier0/other workflows, running in online, etc.
@schneiml Thanks for your for your detailed response. A few things are unclear to me, however.
I assume you still want to avoid the copy of each histogram per stream which was done previously when using edm::stream
modules. So if you are planning on sharing the same underlying histograms between different instances of the same edm::stream
module?
When processing concurrent LuminosityBlocks, how do you see handling the case where you have a Lumi based histogram, but one edm::stream
instance is still working on Lumi # 1 but another one has moved on to Lumi # 2? Clearly, they can not be filling the same underlying ROOT histogram.
@Dr15Jones yes, I do plan to have only a single backing copy and locks (or a thread-safe TH1, if that ever happens). For the cross-lumi case, I'd like the ME (which will be duplicated for each of the streams) to point to different backing objects, in the spirit of LuminosityBlockCache
. Of course, that can be avoided for per-run histograms, but then the same thing applies for run boundaries.
So, it seems to me that fixing this once properly gives us a clean solution for a number of problems, and should still be possible with a trivial migration for "well-behaving" modules. Of course, we'll need to remove a lot of legacy features, but I hope they are not used in too many places.
But then I have not looked at the code for a while, maybe I got overly optimistic...
See #25188
This was already to go this morning, so I decided to still push it to allow the system to do a more thorough testing.
For the cross-lumi case, I'd like the ME (which will be duplicated for each of the streams) to point to different backing objects, in the spirit of LuminosityBlockCache. Of course, that can be avoided for per-run histograms, but then the same thing applies for run boundaries.
So you are thinking about the MEs (which are member data of a given DQM module) holding a 'pointer to a pointer to a pointer to a THthing'? The ME's 'pointer to ...' would then point to a memory location held by the edm::stream
instance (which is inheriting from stream::DQMEDAnalyzer). That memory location would then point to a piece of memory which holds the pointer of the THthing and the std::mutex
used to guard it. Then on ever streamBeginLuminosityBlock the given stream::DQMEDAnalyzer
instance would change the pointers it holds to point to new 'pointer to THthing + std::mutex
groups specific to that LuminosityBlock, where presumeably it has gotten from the global DQMStore so all the different edm::stream
instances will get the same THthing for the same LuminosityBlock.
As for your statement of run boundaries, you already have that covered since the bookHistograms
interface is already being called new for each Run which means for edm::stream
you would already have the proper behavior without any changes.
holding a 'pointer to a pointer to a pointer to a THthing'?
As far as I see, the modules usually hold pointers (bare pointers, sadly...) to the MEs, so we already have the required double indirection.
Otherwise, your proposal sounds roughly like what I was thinking about. Where to actually store the THthings does not matter much, can be in the edm::stream
module or the DQMStore
, as long as we track ownership correctly. What does matter is that some central entity (DQMStore
) has pointers to all of the histograms, so we can still do "global" operations. Though, actually I don't currently see a need for any -- edm-based saving could be handled by Products, and the old DQMStore::save
(that we will have to provide for various special cases) should only work on the MEs of the current module. And snapshots each n lumis (which I'd like to see for various reasons) need to be handled locally as well, to avoid globally synhronizing on lumi boundaries.
For the synchronization for threaded access, I have not thought about that much for now. I guess what works for the ConcurrentME
should also work for this solution, and getting some form of fillAtomic
as a new API in ROOT seemed possible as well. The interesting question is what to do with the modules that do use the bare TH1
pointer: as long it does not get stored, we could maybe hand out a smart pointer that keeps a lock until it goes out of scope? Though then we get closer to edm::one
semantics if this feature is used. Disallowing TH1
access also seems possible, if we keep a "legacy" API around for harvesting.
You do not want to use a lock with a long scope. Way to easy to get into deadlock situations. Instead you could use the same mechanism as DQMStore::bookTransaction
uses to get access to the IBooker
. So you would get rid of all the getTH*
functions and replace them something like
template<typename F> void accessTH1F( F&& iFunc);
Where iFunc is required to take as argument a TH1F*
. Then inside accessTH1F
you take the lock and when the return from it you release the lock.
Yep, that should work, though it is a bit more intrusive. Let's hope there is not many cases that need to use this...
Is this issue still alive? My impression is it is NOT, given the changes made by Marcel on DQMStore in 11_1_X
Good question. I think too that most of the issue has been addressed.
There is still code using getTH1()
, which are also discussed in #29096.
There are still some DQMOneLumiEDAnalyzer
modules, which are also discussed in https://github.com/cms-sw/cmssw/issues/25090.
@Dr15Jones What do you think, are there aspects in this issue that are not being followed up (also) elsewhere?
The DQMEDAnalyzers were converted to
edm::one
modules which use the optionsedm::one::WatchRuns
andedm::one::WatchLuminosityBlocks
. These options cause the framework to then synchronize on those transition boundaries which severely affects the threading efficiency of the jobs in which they are run.To make DQMEDAnalyzers thread efficient, I propose first dealing with the LuminosityBlock transitions and then move to dealing with teh Run transitions. This could be done by
DQMEDAnalyzer
to just beone::DQMEDAnalyzer<>
. This would make this class only work with Run based MonitorElements.one::DQMEDAnalyzer<one::DQMLuminosityBlockElements>
.IBooker
would change to have the option of setting the Lumi number when creating MonitorElementsone::DQMEDAnalyzer<one::DQMLuminosityBlockElements>
would be changed to useedm::LuminosityCache
to hold onto the MonitorElements booked at begin LuminosityBlock time. This will probably entail changingone::DQMLuminosityBlockElements
to take a template argument.MonitorElement::setLumiFlag
would not be accessible to user code as that would be handled directly by theIBooker
.This design change is very similar to what was done with
DQMGlobalEDAnalyzer
.The Core group would be willing to assist in doing this code conversion.