Open silviodonato opened 2 years ago
assign hlt @fwyzard
A new Issue was created by @silviodonato Silvio Donato.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
(I cannot anymore assign the issue) @makortel @Martin-Grunewald @missirol
assign core
assign hlt
assign heterogeneous
New categories assigned: heterogeneous,core,hlt
@missirol,@fwyzard,@Dr15Jones,@smuzaffar,@makortel,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks
The issue is due to
# SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA
process.hltSiPixelRecHits = SwitchProducerCUDA(
# legacy producer
cpu = cms.EDAlias(
hltSiPixelRecHitSoA = cms.VPSet(
cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")),
cms.PSet(type = cms.string("uintAsHostProduct"))
)
),
# conversion from SoA to legacy format
cuda = ...
)
right ?
The issue is due to
# SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA process.hltSiPixelRecHits = SwitchProducerCUDA( # legacy producer cpu = cms.EDAlias( hltSiPixelRecHitSoA = cms.VPSet( cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")), cms.PSet(type = cms.string("uintAsHostProduct")) ) ), # conversion from SoA to legacy format cuda = ... )
right ?
yes, I think so.
Adding drop *_hltSiPixelRecHitSoA_*_*
the error disappeared
Looks like a problem of possibly more general interest, though.
If we put the SwitchProducer branches "inline", like
product = SwitchProducer(
cpu = cms.EDProducer("CPUProducer", ...),
acc = cms.EDProducer("AccProducer", ...)
)
then
"keep *"
to keep only the one that runs"keep *_product@cpu_*_*", "keep *_product@acc_*_*"
. If we put the SwitchProducer branches "out of line" and use EDAliases, like
productCpu = cms.EDProducer("CPUProducer", ...)
productAcc = cms.EDProducer("AccProducer", ...)
product = SwitchProducer(
cpu = cms.EDAlias("productCpu"),
acc = cms.EDAlias("productAcc")
)
then
"keep *"
(this issue), and we probably do not want to"keep *_product_*_*"
to keep only the one that runs"keep *_productCpu_*_*", "keep *_productAcc_*_*"
to run and keep bothThanks a lot @fwyzard for the clarification. I wanted to save the hltPixelTracks, so I used
hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events 10 --input file:aaa.root --output minimal --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets --open
and I added
'keep *_hltPixelTracks_*_*',
to process.hltOutputMinimal.
The tracks are not visible (both running with GPU and with CPU)
Events->Scan("recoTracks_hltPixelTracks__HLTX.@obj.size()")
************************
* Row * recoTrack *
************************
* 0 * 0 *
* 1 * 0 *
* 2 * 0 *
* 3 * 0 *
* 4 * 0 *
* 5 * 0 *
* 6 * 0 *
* 7 * 0 *
* 8 * 0 *
* 9 * 0 *
************************
If I drop the Patatrack customization function, everything looks ok
root [2] Events->Scan("recoTracks_hltPixelTracks__HLTX.@obj.size()")
************************
* Row * recoTrack *
************************
* 0 * 2277 *
* 1 * 0 *
* 2 * 2716 *
* 3 * 2398 *
* 4 * 3702 *
* 5 * 1700 *
* 6 * 3209 *
* 7 * 6061 *
* 8 * 2488 *
* 9 * 2783 *
************************
That's weird !?
hltPixelTracks
is a standard EDProducer, not a SwitchProducer or an EDAlias, and the product should be in the legacy format.
@makortel @Dr15Jones is there anythng that would prevent storing the collection in the root file ? For example, the fact that its ancestors are transient ?
It seems that the problem is not related to SwitchProducer but with Task.
In /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks
you can find two configuration: hltNew_dump_ok.py
and hltNew_dump_ok.py
.
The first configuration is ok, while the second one shows no pixel tracks.
The difference is just:
+process.HLTRecopixelvertexingTask = cms.Task(process.HLTRecoPixelTracksTask, process.hltPixelVertices, process.hltPixelVerticesSoA, process.hltTrimmedPixelVertices)
-process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter+process.HLTRecoPixelTracksSequence+process.hltPixelVerticesSoA+process.hltPixelVertices+process.hltTrimmedPixelVertices)
+process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter, process.HLTRecopixelvertexingTask)
From the log file I see these changes:
It sounds like the well-known problem of a module which is explicitly both in a Sequence and in a Task, but I don't find any of them
ahh, probably I've understood it. The reason is that the module
process.hltL2TauTagNNProducer = cms.EDProducer("L2TauNNProducer",
[...]
pataTracks = cms.InputTag("hltPixelTracksSoA"),
pataVertices = cms.InputTag("hltPixelVerticesSoA"),
)
takes directly the hltPixelTracksSoA
and hltPixelVerticesSoA
, that is the reason why the other modules are not scheduled!
Yes, sorry for the noise, my mistake. Let's keep focused on the issue summarized by Andrea ( https://github.com/cms-sw/cmssw/issues/37207#issuecomment-1064930592 )
@silviodonato what do you get when you do
Events->Scan("recoTracks_hltPixelTracks__HLTX.present")
That is what is actually used by the framework to determine if something was stored. Else we can't tell if the data product was stored or it was just empty.
Also, is the OutputModule on an EndPath
or a FinalPath
? If on a FinalPath
that means data products consumed by the OutputModule will not be prefetched (i.e. will not cause unscheduled execution).
Also, is the OutputModule on an
EndPath
or aFinalPath
?
I checked the two configurations in /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks
and the OutputModule was in FinalPath
in both cases.
About
Let's keep focused on the issue summarized by Andrea ( #37207 (comment) )
could you elaborate the problem? Is it just about having to adjust the keep/drop statements?
One problem is that this approach does not allow us to keep both "branches" (for example, to send them from the HLT jobs to the DQM jobs and perform an online validation):
product = SwitchProducer( cpu = cms.EDProducer("CPUProducer", ...), acc = cms.EDProducer("AccProducer", ...) )
and I'm not sure if the other approach has any downsides or not:
productCpu = cms.EDProducer("CPUProducer", ...) productAcc = cms.EDProducer("AccProducer", ...) product = SwitchProducer( cpu = cms.EDAlias("productCpu"), acc = cms.EDAlias("productAcc") )
In particular, I'm concerned about what would happen if we use productCpu
or productAcc
directly in the reconstruction, while also using (or keeping) the product
alias.
The design goals included
SwitchProducer
do not care which of the case-EDProducers get run
product@cpu
etc being available for consumers in the same process is accidentalIn the first case, even if the "inline products" of SwitchProducer were allowed to be persisted, they couldn't be stored at the same time with the SwitchProducer "output product" for the same "duplicate branch" reason as the with the second case.
(for example, to send them from the HLT jobs to the DQM jobs and perform an online validation)
For this use case, would the aim be to
SwitchProducer
control the running of the case EDProducer, and store the cpu
/acc
versions just as they existcpu
and acc
versions on each Eventcpu
and acc
versions get run for the outputIn particular, I'm concerned about what would happen if we use productCpu or productAcc directly in the reconstruction, while also using (or keeping) the product alias.
Currently with Task
s each consumer of productCpu
/productAcc
will cause the consumed EDProducer to be run, and each consumer of product
will cause the chosen case of the two to be run. With ConditionalTask
(#36938), only the consumers in the same Path as the ConditionalTask
would cause the EDProducers to run.
The error is self-explaining and related to the
keep *
. Not sure what is the best way to avoid this. Perhaps adding addrop
of all the EDAlias (or of the input used in the EDAlias)