cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.32k forks source link

[HLT/Patatrack] Duplicate Output Selection Two (or more) equivalent branches have been selected for output. #37207

Open silviodonato opened 2 years ago

silviodonato commented 2 years ago
cmsrel CMSSW_12_3_0_pre6
cd CMSSW_12_3_0_pre6/src/
cmsenv
hltGetConfiguration /dev/CMSSW_12_3_0/GRun    --globaltag auto:run3_hlt    --data    --unprescale    --eras Run2_2018    --process MYHLT    --full    --offline    --max-events 8    --output full    --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrack    --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root      > hltData.py
CUDA_VISIBLE_DEVICES= cmsRun hltData.py

%MSG-i ThreadStreamSetup:  (NoModuleName) 11-Mar-2022 09:54:44 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i CUDAService:  (NoModuleName) 11-Mar-2022 09:54:47 CET pre-events
CUDAService disabled by configuration
%MSG
11-Mar-2022 09:54:56 CET  Initiating request to open file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
11-Mar-2022 09:54:57 CET  Successfully opened file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
2022-03-11 09:55:18.578515: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
11-Mar-2022 09:55:24 CET  Closed file file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
----- Begin Fatal Exception 11-Mar-2022 09:55:24 CET-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Duplicate Output Selection Two (or more) equivalent branches have been selected for output.
#1: BranchKey(SiPixelRecHitedmNewDetSetVector, hltSiPixelRecHits, , MYHLT)
#2: BranchKey(SiPixelRecHitedmNewDetSetVector, hltSiPixelRecHitSoA, , MYHLT)
Please drop at least one of them.
----- End Fatal Exception -------------------------------------------------

The error is self-explaining and related to the keep *. Not sure what is the best way to avoid this. Perhaps adding ad drop of all the EDAlias (or of the input used in the EDAlias)

silviodonato commented 2 years ago

assign hlt @fwyzard

cmsbuild commented 2 years ago

A new Issue was created by @silviodonato Silvio Donato.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

silviodonato commented 2 years ago

(I cannot anymore assign the issue) @makortel @Martin-Grunewald @missirol

fwyzard commented 2 years ago

assign core

fwyzard commented 2 years ago

assign hlt

fwyzard commented 2 years ago

assign heterogeneous

cmsbuild commented 2 years ago

New categories assigned: heterogeneous,core,hlt

@missirol,@fwyzard,@Dr15Jones,@smuzaffar,@makortel,@makortel,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

fwyzard commented 2 years ago

The issue is due to

        # SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA
        process.hltSiPixelRecHits = SwitchProducerCUDA(
            # legacy producer
            cpu = cms.EDAlias(
                hltSiPixelRecHitSoA = cms.VPSet(
                    cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")),
                    cms.PSet(type = cms.string("uintAsHostProduct"))
                )
            ),
            # conversion from SoA to legacy format
            cuda = ...
        )

right ?

silviodonato commented 2 years ago

The issue is due to

        # SwitchProducer wrapping the legacy pixel rechit producer or the transfer of the pixel rechits to the host and the conversion from SoA
        process.hltSiPixelRecHits = SwitchProducerCUDA(
            # legacy producer
            cpu = cms.EDAlias(
                hltSiPixelRecHitSoA = cms.VPSet(
                    cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector")),
                    cms.PSet(type = cms.string("uintAsHostProduct"))
                )
            ),
            # conversion from SoA to legacy format
            cuda = ...
        )

right ?

yes, I think so. Adding drop *_hltSiPixelRecHitSoA_*_* the error disappeared

fwyzard commented 2 years ago

Looks like a problem of possibly more general interest, though.

If we put the SwitchProducer branches "inline", like

product = SwitchProducer(
    cpu = cms.EDProducer("CPUProducer", ...),
    acc = cms.EDProducer("AccProducer", ...)
)

then

If we put the SwitchProducer branches "out of line" and use EDAliases, like

productCpu = cms.EDProducer("CPUProducer", ...)

productAcc = cms.EDProducer("AccProducer", ...)

product = SwitchProducer(
    cpu = cms.EDAlias("productCpu"),
    acc = cms.EDAlias("productAcc")
)

then

silviodonato commented 2 years ago

Thanks a lot @fwyzard for the clarification. I wanted to save the hltPixelTracks, so I used

hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events 10 --input file:aaa.root --output minimal --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets --open

and I added

        'keep *_hltPixelTracks_*_*',

to process.hltOutputMinimal.

The tracks are not visible (both running with GPU and with CPU)

Events->Scan("recoTracks_hltPixelTracks__HLTX.@obj.size()")
************************
*    Row   * recoTrack *
************************
*        0 *         0 *
*        1 *         0 *
*        2 *         0 *
*        3 *         0 *
*        4 *         0 *
*        5 *         0 *
*        6 *         0 *
*        7 *         0 *
*        8 *         0 *
*        9 *         0 *
************************

If I drop the Patatrack customization function, everything looks ok

root [2] Events->Scan("recoTracks_hltPixelTracks__HLTX.@obj.size()")
************************
*    Row   * recoTrack *
************************
*        0 *      2277 *
*        1 *         0 *
*        2 *      2716 *
*        3 *      2398 *
*        4 *      3702 *
*        5 *      1700 *
*        6 *      3209 *
*        7 *      6061 *
*        8 *      2488 *
*        9 *      2783 *
************************
fwyzard commented 2 years ago

That's weird !?

hltPixelTracks is a standard EDProducer, not a SwitchProducer or an EDAlias, and the product should be in the legacy format.

@makortel @Dr15Jones is there anythng that would prevent storing the collection in the root file ? For example, the fact that its ancestors are transient ?

silviodonato commented 2 years ago

It seems that the problem is not related to SwitchProducer but with Task. In /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks you can find two configuration: hltNew_dump_ok.py and hltNew_dump_ok.py.

The first configuration is ok, while the second one shows no pixel tracks.

The difference is just:

+process.HLTRecopixelvertexingTask = cms.Task(process.HLTRecoPixelTracksTask, process.hltPixelVertices, process.hltPixelVerticesSoA, process.hltTrimmedPixelVertices)
-process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter+process.HLTRecoPixelTracksSequence+process.hltPixelVerticesSoA+process.hltPixelVertices+process.hltTrimmedPixelVertices)
+process.HLTRecopixelvertexingSequence = cms.Sequence(process.hltPixelTracksFitter+process.hltPixelTracksFilter, process.HLTRecopixelvertexingTask)

From the log file I see these changes:

It sounds like the well-known problem of a module which is explicitly both in a Sequence and in a Task, but I don't find any of them

silviodonato commented 2 years ago

ahh, probably I've understood it. The reason is that the module

process.hltL2TauTagNNProducer = cms.EDProducer("L2TauNNProducer",
   [...]
    pataTracks = cms.InputTag("hltPixelTracksSoA"),
    pataVertices = cms.InputTag("hltPixelVerticesSoA"),
)

takes directly the hltPixelTracksSoA and hltPixelVerticesSoA, that is the reason why the other modules are not scheduled!

silviodonato commented 2 years ago

Yes, sorry for the noise, my mistake. Let's keep focused on the issue summarized by Andrea ( https://github.com/cms-sw/cmssw/issues/37207#issuecomment-1064930592 )

Dr15Jones commented 2 years ago

@silviodonato what do you get when you do

Events->Scan("recoTracks_hltPixelTracks__HLTX.present")

That is what is actually used by the framework to determine if something was stored. Else we can't tell if the data product was stored or it was just empty.

Dr15Jones commented 2 years ago

Also, is the OutputModule on an EndPath or a FinalPath? If on a FinalPath that means data products consumed by the OutputModule will not be prefetched (i.e. will not cause unscheduled execution).

makortel commented 2 years ago

Also, is the OutputModule on an EndPath or a FinalPath?

I checked the two configurations in /afs/cern.ch/work/s/sdonato/public/HLT_Task_noPixelTracks and the OutputModule was in FinalPath in both cases.

About

Let's keep focused on the issue summarized by Andrea ( #37207 (comment) )

could you elaborate the problem? Is it just about having to adjust the keep/drop statements?

fwyzard commented 2 years ago

One problem is that this approach does not allow us to keep both "branches" (for example, to send them from the HLT jobs to the DQM jobs and perform an online validation):

product = SwitchProducer(
    cpu = cms.EDProducer("CPUProducer", ...),
    acc = cms.EDProducer("AccProducer", ...)
)

and I'm not sure if the other approach has any downsides or not:

productCpu = cms.EDProducer("CPUProducer", ...)

productAcc = cms.EDProducer("AccProducer", ...)

product = SwitchProducer(
    cpu = cms.EDAlias("productCpu"),
    acc = cms.EDAlias("productAcc")
)

In particular, I'm concerned about what would happen if we use productCpu or productAcc directly in the reconstruction, while also using (or keeping) the product alias.

makortel commented 2 years ago

The design goals included

In the first case, even if the "inline products" of SwitchProducer were allowed to be persisted, they couldn't be stored at the same time with the SwitchProducer "output product" for the same "duplicate branch" reason as the with the second case.

(for example, to send them from the HLT jobs to the DQM jobs and perform an online validation)

For this use case, would the aim be to

In particular, I'm concerned about what would happen if we use productCpu or productAcc directly in the reconstruction, while also using (or keeping) the product alias.

Currently with Tasks each consumer of productCpu/productAcc will cause the consumed EDProducer to be run, and each consumer of product will cause the chosen case of the two to be run. With ConditionalTask (#36938), only the consumers in the same Path as the ConditionalTask would cause the EDProducers to run.

silviodonato commented 2 years ago

https://github.com/cms-sw/cmssw/pull/37359