Open wddgit opened 8 months ago
cms-bot internal usage
A new Issue was created by @wddgit W. David Dagenhart.
@smuzaffar, @Dr15Jones, @makortel, @rappoccio, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign l1
New categories assigned: l1
@epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks
@cms-sw/l1-l2 Any thoughts?
@pallabidas could you tag Sascha's github handle if you know it? This should be brought to the calo trigger people.
However @makortel I think that it is possible (emphasis on think though, just based on the history and my personal unfamiliarity with this module) that the code being used here is no longer in use for the current trigger system or for phase 2 RCT/GCT, and isn't a part of any current L1 emulation. This may be legacy code so there shouldn't really be an issue other than any desired backwards compatibility.
@asavincms
I think this is old stage1 code that is not used anymore, I remember this discussion even for new - Layer1 code where it was fixed. Can I see the link to the code in question?
https://cmssdt.cern.ch/dxr/CMSSW/source/L1Trigger/RegionalCaloTrigger/plugins/L1RCTProducer.cc#101
This is the troublesome part of the code, where it might call updateFedVector every 100th lumi. That will interact badly with say one stream skipping a lumi when the other streams already processed all the events.
I'd be happy to delete the module entirely. It will still remain in the git history.
Please go ahead
Deleting the module is not as easy as I thought.
L1Trigger/RegionalCaloTrigger/python/rctDigis_cfi.py uses the type.
That get imported by these:
L1Trigger/Configuration/python/L1CaloEmulator_cff.py L1Trigger/Configuration/python/ValL1Emulator_cff.py L1Trigger/L1TCalorimeter/python/simDigis_cff.py
Those in turns get imported by these:
L1Trigger/Configuration/python/L1Emulator_cff.py L1Trigger/GlobalCaloTrigger/test/writeGctDigis_cfg.py
DQMOffline/L1Trigger/python/L1TriggerDqmOffline_cff.py L1Trigger/HardwareValidation/python/L1HardwareValidation_cff.py
DQM/L1TMonitor/python/L1TStage2Emulator_cff.py L1Trigger/Configuration/python/SimL1CaloEmulator_cff.py L1Trigger/Configuration/python/SimL1Emulator_cff.py
And quite a few things import those at high levels and appear to be actually in use. I didn't test, but it looks like this may be hard to actually delete and possible something really needs it.
Here is an alternate suggestion.
There is a configuration parameter named getFedsFromOmds
set here:
https://cmssdt.cern.ch/dxr/CMSSW/source/L1Trigger/RegionalCaloTrigger/plugins/L1RCTProducer.cc#42
It is never reset in the C++ code. In almost all places, it is always configured false (at least in the CMSSW repository). The one exception is
It that test configuration ever used? The code that is causing problems is only executed if that configuration parameter is set true.
https://cmssdt.cern.ch/dxr/CMSSW/source/L1Trigger/RegionalCaloTrigger/plugins/L1RCTProducer.cc#96
If the parameter is never true in practice, then I could delete only the code that is executed when it is true and leave the rest unchanged.
One other thing I noticed. The test configuration is a single stream configuration. If there isn't more than one stream, then there isn't a problem. It is ugly to leave a hole like this for someone to fall into, but if this is really the only place this is used it is OK as is.
ok for me
I submitted #43779 to deal with this. It will cause an exception with a descriptive message to thrown if anyone tries to run the module with the configuration option getFedsFromOmds set true. Someday in the future, if no problems are reported someone can clean this up and delete all the unnecessary code.
https://github.com/cms-sw/cmssw/pull/43779 was merged, so I guess we could close this issue?
L1RCTProducer behaves in an unusual way in beginLuminosityBlock. There are configuration options that allow it to call
updateFedVector
every nth lumi or on the nth lumi (defaults are every 100th lumi and the 10th). This is dependent on EventSetup data, probably there are better and more standard ways to do this.Our motivation in asking about this behavior is related to a proposed performance improvement to the Framework that will allow a stream to skip a lumi if all the events were already processed by other streams. See draft PR #43522. This will help significantly if one stream encounters a very slow event. That stream will not be able to finish the lumi it is processing or any subsequent lumis until that event is done. It is possible for the system to reach the maximum concurrent lumi limit and all streams be stuck until the extremely slow event completes. With the improvement, other streams will be able to finish subsequent lumis and close them and keep working productively.
For most modules, this improvement does not seem to cause any problems. We have been surveying the code looking for modules that might have a problem when a stream skips a lumi (skips the stream begin lumi and stream end lumi transitions). L1RCTProducer would have a problem if some streams skipped a lumi. For example, say on lumi 400
updateFedVector
was called in beginLuminosityBlock but some stream skipped lumi 400. Then that stream would get stale results out ofupdateFedVector
until lumi 500.One other comment. It is possible this unusual behavior is for performance, but it may be counterproductive. As written,
updateFedVector
gets called for every stream. I can imagine more standard approaches using the EventSetup where it would only be called once when the underlying EventSetup products change.Here are some questions. Why is the L1RCTProducer written in this way? Can it be changed so it will still work properly if stream are allowed to skip a lumi if all events were already processed by other streams?
FYI @makortel