reproducibility of GPU vs CPU results at HLT

fwyzard commented 3 years ago

Tests done in multiple recent releases have shown that the HLT results are not consistent when running on GPU vs on CPU.

Here are the instruction to reproduce the issue using

CMSSW_12_1_0_pre3
- for working with a more recent IB, it may be necessary to add #35497
the HLT menu included in that release, /dev/CMSSW_12_1_0/GRun/V1
a recent Run-3 relval sample of TTbar, e.g.
- from CMSSW_12_0_0_pre6: /RelValTTbar_14TeV/CMSSW_12_0_0_pre6-PU_120X_mcRun3_2021_realistic_v4_JIRA_129-v1/GEN-SIM-DIGI-RAW
- from CMSSW_12_1_0_pre3: /store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root
the "auto" Run-3 global tag from the release, auto:phase1_2021_realistic; running on a previous release using the same global tag as the sample itself (120X_mcRun3_2021_realistic_v4) shows a similar behaviour.

setup a CMSSW working area

cmsrel CMSSW_12_1_0_pre3
cd CMSSW_12_1_0_pre3/   
cmsenv
mkdir run
cd run

extract the HLT configuration for running on GPU using the Run3 era

hltGetConfiguration /dev/CMSSW_12_1_0/GRun/V1 \
    --eras Run3 \
    --globaltag auto:phase1_2021_realistic \
    --mc \
    --unprescale \
    --output minimal \
    --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrack \
    --input /store/relval/CMSSW_12_0_0_pre6/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_120X_mcRun3_2021_realistic_v4_JIRA_129-v1/00000/79c06ed5-929b-4a57-a4f2-1ae90e6b38c5.root \
    > hlt.py

run the HLT menu on a GPU-equipped machine

# run the HLT menu without any GPUs
CUDA_VISIBLE_DEVICES= cmsRun hlt.py
mv output.root output-cpu.root

# use available GPUs
cmsRun hlt.py
mv output.root output-gpu.root

compare the results

hltDiff -o output-cpu.root -n output-gpu.root
...
Found 100 matching events, out of which 100 have different HLT results

      Events    Accepted      Gained        Lost       Other  Trigger
         100           0           -           -          ~1  HLT_AK8PFHT750_TrimMass50_v12
         100           0           -           -          ~1  HLT_DoubleEle8_CaloIdM_TrackIdM_Mass8_DZ_PFHT350_v20
         100           0           -           -          ~1  HLT_DoubleEle8_CaloIdM_TrackIdM_Mass8_PFHT350_v20
         100           2          +2           -           -  HLT_DoubleMu4_Mass3p8_DZ_PFHT350_v8
         100          17           -           -          ~2  HLT_Ele15_WPLoose_Gsf_v3
         100          17           -           -          ~2  HLT_Ele17_WPLoose_Gsf_v3
         100          17           -           -          ~1  HLT_Ele20_WPLoose_Gsf_v6
         100          17           -           -          ~1  HLT_Ele20_eta2p1_WPLoose_Gsf_v6
         100          10          +1           -           -  HLT_HT450_Beamspot_v11
         100          42          +2           -           -  HLT_HT300_Beamspot_v11
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_LooseChargedIsoPFTauHPS27_eta2p1_CrossL1_v4
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_MediumChargedIsoPFTauHPS27_eta2p1_CrossL1_v1
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_TightChargedIsoPFTauHPS27_eta2p1_CrossL1_v1
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_LooseChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_MediumChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1
         100           4           -          -1           -  HLT_IsoMu20_eta2p1_TightChargedIsoPFTauHPS27_eta2p1_TightID_CrossL1_v1
         100          81          +1           -           -  HLT_DiPFJet15_FBEta3_NoCaloMatched_v17
         100          98           -          -1           -  HLT_DiPFJetAve40_v14
         100          63           -           -          ~1  HLT_DiPFJetAve80_v13
         100           8          +1           -           -  HLT_DiPFJetAve140_v13
         100           0           -           -          ~1  HLT_DiPFJetAve220_HFJEC_v16
         100           7           -           -          ~1  HLT_AK8PFJet200_v15
         100          73          +1           -           -  HLT_PFJet80_v20
         100          68           -          -4           -  HLT_PFJetFwd25_v3
         100          74          +5          -1           -  HLT_AK8PFJetFwd40_v15
         100          15           -          -1           -  HLT_AK8PFJetFwd60_v14
         100           4          +1           -           -  HLT_AK8PFJetFwd80_v14
         100          56           -          -1           -  HLT_PFHT250_v17
         100          16           -           -          ~1  HLT_PFHT370_v17
         100           8          +1           -           -  HLT_PFHT430_v17
         100           4           -           -          ~1  HLT_PFHT510_v17
         100           4           -           -          ~1  HLT_PFHT590_v17
         100           0          +1           -           -  HLT_PFHT890_v17
         100           0           -           -          ~4  HLT_PFHT500_PFMET100_PFMHT100_IDTight_v12
         100           0           -           -          ~2  HLT_PFHT500_PFMET110_PFMHT110_IDTight_v12
         100           0           -           -          ~3  HLT_PFHT700_PFMET95_PFMHT95_IDTight_v12
         100           6           -          -1           -  HLT_PFMET120_PFMHT120_IDTight_v20
         100           2          +1           -          ~1  HLT_PFMET140_PFMHT140_IDTight_v20
         100           5          +1           -           -  HLT_PFMET100_PFMHT100_IDTight_CaloBTagDeepCSV_3p1_v8
         100           5           -          -1           -  HLT_PFMET120_PFMHT120_IDTight_CaloBTagDeepCSV_3p1_v8
         100           1          +1           -          ~1  HLT_PFMET140_PFMHT140_IDTight_CaloBTagDeepCSV_3p1_v8
         100           6           -          -1          ~1  HLT_PFMET120_PFMHT120_IDTight_PFHT60_v9
         100           8           -           -          ~1  HLT_PFMETNoMu120_PFMHTNoMu120_IDTight_PFHT60_v9
         100           6           -           -          ~1  HLT_PFMETTypeOne120_PFMHT120_IDTight_PFHT60_v9
         100           4           -           -          ~1  HLT_PFMETTypeOne140_PFMHT140_IDTight_v11
         100           8           -           -          ~1  HLT_PFMETNoMu110_PFMHTNoMu110_IDTight_v20
         100          11          +1           -           -  HLT_CaloMHT90_v4
         100           2          +1           -           -  HLT_Mu12_DoublePFJets54MaxDeta1p6_DoubleCaloBTagDeepCSV_p71_v2
         100          73           -          -1           -  HLT_DoublePFJets40_CaloBTagDeepCSV_p71_v2
         100          16           -          -1           -  HLT_DoublePFJets100_CaloBTagDeepCSV_p71_v2
         100           2          +1           -           -  HLT_DoublePFJets200_CaloBTagDeepCSV_p71_v2
         100           7           -           -          ~1  HLT_BTagMu_AK4DiJet40_Mu5_v13
         100          16          +1           -           -  HLT_BTagMu_AK4DiJet40_Mu5_noalgo_v13
         100          12          +2           -           -  HLT_HT425_v9
         100           0           -           -          ~1  HLT_HT430_DisplacedDijet40_DisplacedTrack_v13
         100           0           -           -          ~1  HLT_HT430_DisplacedDijet60_DisplacedTrack_v13
         100           0           -           -          ~2  HLT_HT400_DisplacedDijet40_DisplacedTrack_v13
         100           0           -           -          ~1  HLT_DiJet110_35_Mjj650_PFMET110_v9
         100           0           -           -          ~1  HLT_DiJet110_35_Mjj650_PFMET120_v9
         100           0           -           -          ~1  HLT_DiJet110_35_Mjj650_PFMET130_v9
         100           0           -           -          ~1  HLT_TripleJet110_35_35_Mjj650_PFMET110_v9
         100           0           -           -          ~1  HLT_TripleJet110_35_35_Mjj650_PFMET120_v9
         100           0           -           -          ~1  HLT_TripleJet110_35_35_Mjj650_PFMET130_v9
         100           1           -           -          ~1  HLT_Ele15_IsoVVVL_PFHT450_CaloBTagDeepCSV_4p5_v8
         100           0           -           -          ~1  HLT_Ele15_IsoVVVL_PFHT450_PFMET50_v16
         100           1           -           -          ~1  HLT_Ele15_IsoVVVL_PFHT450_v16
         100           1           -           -          ~1  HLT_Ele15_IsoVVVL_PFHT600_v20
         100           1          +1           -           -  HLT_Mu4_TrkIsoVVL_DiPFJet90_40_DEta3p5_MJJ750_HTT300_PFMETNoMu60_v15
         100           1          +1           -           -  HLT_Mu8_TrkIsoVVL_DiPFJet40_DEta3p5_MJJ750_HTT300_PFMETNoMu60_v16
         100           1          +1           -           -  HLT_Mu10_TrkIsoVVL_DiPFJet40_DEta3p5_MJJ750_HTT350_PFMETNoMu60_v15
         100           1           -          -1           -  HLT_Mu15_IsoVVVL_PFHT450_PFMET50_v15
         100           3           -          -1          ~1  HLT_Mu3er1p5_PFJet100er2p5_PFMET70_PFMHT70_IDTight_v2
         100           3           -          -1           -  HLT_Mu3er1p5_PFJet100er2p5_PFMET80_PFMHT80_IDTight_v2
         100           3           -           -          ~1  HLT_Mu3er1p5_PFJet100er2p5_PFMETNoMu70_PFMHTNoMu70_IDTight_v2
         100          23           -           -          ~1  HLT_Ele8_CaloIdM_TrackIdM_PFJet30_v18
         100          18           -           -          ~1  HLT_Ele17_CaloIdM_TrackIdM_PFJet30_v16
         100           5          +1          -1          ~1  HLT_PFHT330PT30_QuadPFJet_75_60_45_40_TriplePFBTagDeepCSV_4p5_v3
         100          18          +2           -           -  HLT_PFHT330PT30_QuadPFJet_75_60_45_40_v9
         100           3           -           -          ~2  HLT_PFHT400_SixPFJet32_DoublePFBTagDeepCSV_2p94_v8
         100           4           -           -          ~2  HLT_PFHT400_SixPFJet32_v8
         100           5           -           -          ~3  HLT_PFHT400_FivePFJet_100_100_60_30_30_v8
         100           3           -           -          ~3  HLT_PFHT400_FivePFJet_100_100_60_30_30_DoublePFBTagDeepCSV_4p5_v8
         100           3           -           -          ~1  HLT_PFHT400_FivePFJet_120_120_60_30_30_DoublePFBTagDeepCSV_4p5_v8
         100          18          +4           -           -  HLT_PFHT350_v19
         100          20          +1           -           -  HLT_PFHT350MinPFJet15_v9
         100           2           -           -          ~1  HLT_DiSC30_18_EIso_AND_HE_Mass70_v13
         100          84          +1          -1           -  HLT_AK4CaloJet80_v10
         100          60          +1           -           -  HLT_AK4CaloJet100_v10
         100          73          +1           -           -  HLT_AK4PFJet80_v19
         100          49          +1           -           -  HLT_AK4PFJet100_v19
         100          33          +1           -           -  HLT_AK4PFJet120_v18
         100          89          +2           -           -  MC_PFBTagDeepCSV_v10
         100          18           -           -          ~1  MC_Ele5_WPTight_Gsf_v8
         100           0           -           -          ~1  HLT_MediumChargedIsoPFTau50_Trk30_eta2p1_1pr_MET130_v8
         100           1           -           -          ~1  HLT_MediumChargedIsoPFTau180HighPtRelaxedIso_Trk50_eta2p1_1pr_v11
         100           1           -           -          ~1  HLT_MediumChargedIsoPFTau180HighPtRelaxedIso_Trk50_eta2p1_v12
         100           0          +1           -           -  HLT_Rsq0p35_v15
         100           0          +1           -           -  HLT_Rsq0p40_v15
         100           1          +1           -          ~1  HLT_RsqMR300_Rsq0p09_MR200_v15
         100           0          +1           -          ~1  HLT_RsqMR320_Rsq0p09_MR200_v15
         100           2           -           -          ~1  HLT_RsqMR300_Rsq0p09_MR200_4jet_v15
         100           1           -           -          ~1  HLT_RsqMR320_Rsq0p09_MR200_4jet_v15
         100           1           -           -          ~1  HLT_DoubleMediumChargedIsoPFTau35_Trk1_eta2p1_Reg_v12
         100           1           -           -          ~1  HLT_DoubleMediumChargedIsoPFTau35_Trk1_TightID_eta2p1_Reg_v12
         100           1           -           -          ~1  HLT_DoubleTightChargedIsoPFTau35_Trk1_eta2p1_Reg_v12
         100           1           -           -          ~1  HLT_DoubleTightChargedIsoPFTau35_Trk1_TightID_eta2p1_Reg_v12
         100           1           -           -          ~1  HLT_DoubleTightChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_v1
         100           1           -           -          ~1  HLT_DoubleMediumChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_v1
         100           1           -           -          ~1  HLT_DoubleMediumChargedIsoPFTauHPS35_Trk1_eta2p1_Reg_v4
         100           1           -           -          ~1  HLT_DoubleTightChargedIsoPFTauHPS35_Trk1_TightID_eta2p1_Reg_v1
         100           0           -           -          ~1  HLT_VBF_DoubleLooseChargedIsoPFTauHPS20_Trk1_eta2p1_v1
         100           0           -           -          ~1  HLT_VBF_DoubleMediumChargedIsoPFTauHPS20_Trk1_eta2p1_v1
         100           0           -           -          ~1  HLT_VBF_DoubleTightChargedIsoPFTauHPS20_Trk1_eta2p1_v1
         100           9          +1           -           -  HLT_PFMETNoMu100_PFMHTNoMu100_IDTight_PFHT60_v9
         100           8           -          -1           -  HLT_PFMETTypeOne100_PFMHT100_IDTight_PFHT60_v9
         100           3           -           -          ~2  HLT_QuadPFJet98_83_71_15_DoublePFBTagDeepCSV_1p3_7p7_VBF1_v8
         100           3           -           -          ~2  HLT_QuadPFJet103_88_75_15_DoublePFBTagDeepCSV_1p3_7p7_VBF1_v8
         100           3           -           -          ~3  HLT_QuadPFJet111_90_80_15_DoublePFBTagDeepCSV_1p3_7p7_VBF1_v8
         100           0           -           -          ~2  HLT_QuadPFJet98_83_71_15_PFBTagDeepCSV_1p3_VBF2_v8
         100           0           -           -          ~2  HLT_QuadPFJet103_88_75_15_PFBTagDeepCSV_1p3_VBF2_v8
         100           0           -           -          ~2  HLT_QuadPFJet105_88_76_15_PFBTagDeepCSV_1p3_VBF2_v8
         100           0           -           -          ~3  HLT_QuadPFJet111_90_80_15_PFBTagDeepCSV_1p3_VBF2_v8
         100          19           -          -1          ~2  HLT_QuadPFJet98_83_71_15_v5
         100          11          +2           -          ~1  HLT_QuadPFJet103_88_75_15_v5
         100          11          +2           -          ~1  HLT_QuadPFJet105_88_76_15_v5
         100           8          +2          -1          ~1  HLT_QuadPFJet111_90_80_15_v5
         100           3           -           -          ~2  HLT_QuadPFJet105_88_76_15_DoublePFBTagDeepCSV_1p3_7p7_VBF1_v8
         100          29           -          -1           -  HLT_TrkMu6NoFiltersNoVtx_v1
         100         100           -        -100           -  Status_OnCPU
         100           0        +100           -           -  Status_OnGPU

To disentangle the various effects, one can use different customisations on top of the HLT menu, running each resulting configuration with a GPU and without a GPU (that is, fully on the CPU). Replace the customisation at the bottom of the hlt.py file

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import customizeHLTforPatatrack
process = customizeHLTforPatatrack(process)

with a more fine-grained one, described below.

legacy configuration

Run the HLT menu unchanged, adding only the Status_OnGPU and Status_OnCPU paths, without actually offloading any reconstruction to GPU:

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import *
process = customiseCommon(process)

ECAL-only changes

To check the impact of running the ECAL reconstruction on GPU vs CPU, apply only the ECAL changes:

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import *
process = customiseCommon(process)
process = customiseEcalLocalReconstruction(process)

HCAL-only changes

To check the impact of running the HCAL reconstruction on GPU vs CPU, apply only the HCAL changes:

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import *
process = customiseCommon(process)
process = customiseHcalLocalReconstruction(process)

Pixel local reconstruction changes

To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel changes:

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import *
process = customiseCommon(process)
process = customisePixelLocalReconstruction(process)

Pixel track reconstruction changes

To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel and Tracking changes. Clearly, for this comparison to be meaningful, the previous one needs to be understood first.

#User-defined customization functions
from HLTrigger.Configuration.customizeHLTforPatatrack import *
process = customiseCommon(process)
process = customisePixelLocalReconstruction(process)
process = customisePixelTrackReconstruction(process)

The ECAL-only comparison did not reveal significant differences.

The HCAL-only comparison showed significant differences in a few % of the events (order of 10% of the accepted events).

The Pixel local reconstruction comparison showed significant differences in a few % of the events (order of 10% of the accepted events), while affecting less paths than the HCAL one.

I think that looking at the Pixel track comparison makes sense only after fixing the local reconstruction one.

Updates

for running with recent IBs, please use https://github.com/cms-sw/cmssw/pull/35497 .
the 12.1.0-pre3 relvals can also be used, for example /store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root .

VinInn commented 2 years ago

Can you please post the receipt for reproducing it? Is it possible to identify culprit events?

silviodonato commented 2 years ago

Can you please post the receipt for reproducing it? Is it possible to identify culprit events?

Ok, I asked Ganesh to send me the ROOT files with the trigger results and then I will make a skim of the culprit events.

Btw. A fix of #35668 would be very useful to understand if the differences comes from the pixel local reco or from the pixel tracking.

silviodonato commented 2 years ago

I investigated a bit why HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1 had such large differences and I noticed that they come from the L2NN cut (ie. the preliminary tau-tagging done using the pixel tracks).

I made an HLT path cutting only the L2NN and then I see even larger GPU-GPU fluctuations (~30%). You can easily reproduce this by using

$ hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root --output minimal --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets,HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input > hlt.py
[... I've increased the number of threads...]
$ CUDA_DEVICES=0 cmsRun hlt.py >& log &
$ mv output.root output_2.root
$ CUDA_DEVICES=0 cmsRun hlt.py >& log &
$ hltDiff -o output.root -n output_2.root

Found 3151 matching events, out of which 57 have different HLT results

      Events    Accepted      Gained        Lost       Other  Trigger
        3151         171         +31         -26           -  HLT_OnlyL2NN_v1

Using the -v 1 option you can see which events have changed. After 4 attempts, these events changed (first ten events):

(run 323775, lumi 138 of /eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root)

silviodonato commented 2 years ago

I made a quick check with a Run3 RelVal

hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3   --globaltag 123X_mcRun3_2021_realistic_v6    --data   --eras Run3  --max-events -1     --input file:/eos/cms/store/relval/CMSSW_12_3_0_pre5/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/123X_mcRun3_2021_realistic_v6-v1/10000/83efc2d4-c2e1-4aa9-af7d-832ff76e29dd.root --output minimal --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets   > hltRelVals.py

getting

[sdonato@cms-hlt-gpu src]$ hltDiff -o RelVal_1/output.root -n RelVal_2/output.root  -v2
Processed events: 0 out of 900 (0%)
Processed events: 90 out of 900 (10%)
Processed events: 180 out of 900 (20%)
Processed events: 270 out of 900 (30%)
Processed events: 360 out of 900 (40%)
run 1, lumi 87, event 8634: old result is 'accepted', new result is 'accepted'
    Path HLT_OnlyL2NN_v1:
        old state is 'rejected' by module 19 'hltL2DoubleTauTagNNFilter' [L2TauTagFilter],
        new state is 'accepted'
    Filter hltL2DoubleTauTagNNFilter:
        old trigger candidates:
            filter id: 0, object id: 0, pT: 80.5, eta: -0.087, phi: 0.435, mass: 0
        new trigger candidates:
            filter id: 0, object id: 0, pT: 80.5, eta: -0.087, phi: 0.435, mass: 0
            filter id: 1, object id: 0, pT: 39.5, eta: 0.87, phi: -0.280186, mass: 0

Processed events: 450 out of 900 (50%)
Processed events: 540 out of 900 (60%)
Processed events: 630 out of 900 (70%)
Processed events: 720 out of 900 (80%)
run 1, lumi 88, event 8728: old result is 'accepted', new result is 'accepted'
    Path HLT_OnlyL2NN_v1:
        old state is 'accepted',
        new state is 'rejected' by module 19 'hltL2DoubleTauTagNNFilter' [L2TauTagFilter]
    Filter hltL2DoubleTauTagNNFilter:
        old trigger candidates:
            filter id: 0, object id: 0, pT: 255.5, eta: -0.261, phi: -0.715185, mass: 0
            filter id: 1, object id: 0, pT: 89.5, eta: 0.261, phi: -1.06319, mass: 0
        new trigger candidates:
            filter id: 0, object id: 0, pT: 255.5, eta: -0.261, phi: -0.715185, mass: 0

Processed events: 810 out of 900 (90%)
Found 900 matching events, out of which 2 have different HLT results

      Events    Accepted      Gained        Lost       Other  Trigger
         900          35          +1          -1           -  HLT_OnlyL2NN_v1

VinInn commented 2 years ago

I investigated a bit why HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1 had such large differences and I noticed that they come from the L2NN cut (ie. the preliminary tau-tagging done using the pixel tracks).

I made an HLT path cutting only the L2NN and then I see even larger GPU-GPU fluctuations (~30%).

I suppose this NN was trained on some old version of Puxel Tracks (quadruplets) Maybe would be worth retraining,,, (and apply some selection to the input)

v.

fwyzard commented 2 years ago

Is the NN sensitive to the order of the tracks ?

valeriadamante commented 2 years ago

Hello,

The training has been done on triplets, not on quadruplets, and the NN should not be sensitive to the order of the tracks: the patatrack-related inputs are sum of kinematic observables normalised to the Pt sum, and the total number of tracks in the specific cell as specified here in Slide 7.

Valeria

silviodonato commented 2 years ago

I asked Valeria to comment here about the tau L2NN. I think that the training was done with Triplets. No idea about the order of tracks.

Meanwhile I tried to store the objects using keep * (in CMSSW_12_3_X_2022-03-03-1100), and I see no Vertex/Track inside (even if the edmEventSize says that they are stored)

root [10] Events->Scan("ushorts_hltPixelTracks__AAA.@obj.size():recoTracks_hltPixelTracks__AAA.obj.pt():recoTracks_hltPixelTracks__AAA.@obj.size():floats_hltL2TauTagNNProducer_SingleTau_AAA.@obj.size():floats_hltL2TauTagNNProducer_SingleTau_AAA.obj.","floats_hltL2TauTagNNProducer_SingleTau_AAA.@obj.size()>0")
***********************************************************************************
*    Row   * Instance * ushorts_h * recoTrack * recoTrack * floats_hl * floats_hl *
***********************************************************************************
*       79 *        0 *         0 *           *         0 *         1 * 0.0425582 *
*       81 *        0 *         0 *           *         0 *         1 * 0.1661431 *
*       83 *        0 *         0 *           *         0 *         1 * 0.0424410 *
*       88 *        0 *         0 *           *         0 *         2 * 0.2396616 *
*       88 *        1 *         0 *           *         0 *         2 * 0.0305398 *

I tried again running with no filters (--open) and using directly --output full,

 hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag 123X_mcRun3_2021_realistic_v6 --data --eras Run3 --max-events -1 --input file:/eos/cms/store/relval/CMSSW_12_3_0_pre5/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/123X_mcRun3_2021_realistic_v6-v1/10000/83efc2d4-c2e1-4aa9-af7d-832ff76e29dd.root --output full --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets --open --process MYHLT > hlt.py

and I got

root [2] Events->Scan("recoTracks_hltPixelTracks__AAA.obj.pt()")
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:7:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: could not acquire lock file for module 'cuda': failed to create unique file /cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm.lock-dc5b9d8f: Read-only file system [-Rmodule-build]
#include <cuda_runtime.h>
         ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: building module 'cuda' as '/cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm' [-Rmodule-build]
error: unable to open output file '/cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm': 'Read-only file system'
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: finished building module 'cuda' [-Rmodule-build]
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: fatal error: could not build module 'cuda'
#include <cuda_runtime.h>
 ~~~~~~~~^
Error in <TInterpreter::AutoParse>: Error parsing payload code for class hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > with content:

#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"

#ifndef CMS_DICT_IMPL
  #define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
  #define _REENTRANT 1
#endif
#ifndef GNUSOURCE
  #define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
  #define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
  #define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
  #define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
  #define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
  #define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
  #define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
  #define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
  #define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
  #define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
  #define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
  #define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
  #define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
  #define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
  #define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
  #define CMSSW_REFLEX_DICT 1
#endif

#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"

#undef  _BACKWARD_BACKWARD_WARNING_H

In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:7:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:14:67: error: use of undeclared identifier 'cudaStream_t'
    using SharedStreamPtr = std::shared_ptr<std::remove_pointer_t<cudaStream_t>>;
                                                                  ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:14:81: error: expected a type
    using SharedStreamPtr = std::shared_ptr<std::remove_pointer_t<cudaStream_t>>;
                                                                                ^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:8:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedEventPtr.h:14:66: error: use of undeclared identifier 'cudaEvent_t'
    using SharedEventPtr = std::shared_ptr<std::remove_pointer_t<cudaEvent_t>>;
                                                                 ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedEventPtr.h:14:79: error: expected a type
    using SharedEventPtr = std::shared_ptr<std::remove_pointer_t<cudaEvent_t>>;
                                                                              ^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:49:7: error: unknown type name 'cudaStream_t'
      cudaStream_t stream() const { return stream_.get(); }
      ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:55:7: error: unknown type name 'cudaEvent_t'
      cudaEvent_t event() const { return event_.get(); }
      ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:58:40: error: unknown type name 'SharedStreamPtr'
      explicit ProductBase(int device, SharedStreamPtr stream, SharedEventPtr event)
                                       ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:58:64: error: unknown type name 'SharedEventPtr'
      explicit ProductBase(int device, SharedStreamPtr stream, SharedEventPtr event)
                                                               ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:66:13: error: unknown type name 'SharedStreamPtr'
      const SharedStreamPtr& streamPtr() const { return stream_; }
            ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:78:7: error: unknown type name 'SharedStreamPtr'
      SharedStreamPtr stream_;  //!
      ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:80:7: error: unknown type name 'SharedEventPtr'
      SharedEventPtr event_;  //!
      ^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:48:36: error: unknown type name 'SharedStreamPtr'
      explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, T data)
                                   ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:48:60: error: unknown type name 'SharedEventPtr'
      explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, T data)
                                                           ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:52:36: error: unknown type name 'SharedStreamPtr'
      explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, Args&&... args)
                                   ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:52:60: error: unknown type name 'SharedEventPtr'
      explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, Args&&... args)
                                                           ^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:60:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/CaloCommon/interface/Common.h:6:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:15:17: error: unknown type name 'cudaError_t'
      bad_alloc(cudaError_t error) noexcept : error_(error) {}
                ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:20:7: error: unknown type name 'cudaError_t'
      cudaError_t error_;
      ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:23:48: error: use of undeclared identifier 'cudaHostAllocDefault'
    template <typename T, unsigned int FLAGS = cudaHostAllocDefault>
                                               ^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:36:9: error: unknown type name 'cudaError_t'
        cudaError_t status = cudaMallocHost(&ptr, n * sizeof(T), FLAGS);
        ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
Error in <TInterpreter::AutoParse>: Error parsing payload code for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > with content:

#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"

#ifndef CMS_DICT_IMPL
  #define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
  #define _REENTRANT 1
#endif
#ifndef GNUSOURCE
  #define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
  #define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
  #define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
  #define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
  #define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
  #define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
  #define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
  #define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
  #define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
  #define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
  #define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
  #define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
  #define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
  #define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
  #define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
  #define CMSSW_REFLEX_DICT 1
#endif

#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"

#undef  _BACKWARD_BACKWARD_WARNING_H

Error in <TInterpreter::AutoParse>: Error parsing payload code for class hcal::RecHitCollection with content:

#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"

#ifndef CMS_DICT_IMPL
  #define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
  #define _REENTRANT 1
#endif
#ifndef GNUSOURCE
  #define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
  #define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
  #define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
  #define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
  #define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
  #define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
  #define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
  #define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
  #define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
  #define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
  #define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
  #define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
  #define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
  #define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
  #define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
  #define CMSSW_REFLEX_DICT 1
#endif

#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"

#undef  _BACKWARD_BACKWARD_WARNING_H

Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.
Error in <TTreeFormula::Compile>:  Bad numerical expression : "recoTracks_hltPixelTracks__AAA.obj.pt()"
************************
*    Row   * recoTrack *
************************
*        0 *           *
*        1 *           *
*        2 *           *
*        3 *           *
*        4 *           *
*        5 *           *
*        6 *           *
*        7 *           *
*        8 *           *
*        9 *           *
************************

VinInn commented 2 years ago

On 4 Mar, 2022, at 12:43 PM, valeriadamante @.***> wrote:

Hello,

The training has been done on triplets, not on quadruplets, and the NN should not be sensitive to the order of the tracks: the patatrack-related inputs are sum of kinematic observables normalised to the Pt sum, and the total number of tracks in the specific cell as specified here in Slide 7.

I think it would be useful to look into the events identified by Silvio and try to indentify what makes the NN to produce a different output for the same event when run multiple times

v.

makortel commented 2 years ago

root [2] Events->Scan("recoTracks_hltPixelTracks__AAA.obj.pt()")

Reading std::vector<Track> leading to

In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
....
Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.

is strange. There should be no dependence on CUDADataFormats from DataFormats.

valeriadamante commented 2 years ago

Hi all,

Running multiple times on the same event, I found differences in:

total number of tracks
vertex sorted index
track quality

So what I did is:

1) Running from CMSSW_12_3_X_2022-03-03-1100, with a configuration obtained by: hltGetConfiguration /users/vdamante/GPUTest/HLT/V4 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root --no-output --process MYHLT > hlt_Valeria2.py

2) In the obtained file customizeHLTforPatatrackTriplets and a customisation function to get an ntuple with CNN outputs (and some other minor adjustments - adding of endjob_step and patAlgosToolsTask) have been applied. And I selected the event (this is one of the problematic ones) via process.source.eventsToProcess = cms.untracked.VEventRange(['323775:138:179372613']).

3) on lxplus-gpu.cern.ch, I ran 6 separated times and I saved patatrack observables with: cmsRun hlt_Valeria2.py

4) I printed out all patatrack and pata-vertices related observables that fulfill the following requirements:

patatrack with nHits>0, quality >= pixelTrack::Quality::Loose
no requirements on vertices

5) The obtained files are attached (named tracks_14_i, vertices_14_i with i=16,18,24 and tracks_16_i, vertices_16_i with i=35,38,40 ) are attached here. If you compare them (I did with a very basic python script!) you can see that there are differences in:

vertex sorted index
vertex associated to track index (var 'idv')
track quality
total number of tracks Note: the differences are evaluated when pt, eta, phi are compatible (requiring pts to be equal and deltaR2 between patatracks with equal pt to be <0.00001 )

For L2NNTag the most relevant change is the number of patatracks associated to vertices, which in the problematic event fluctuates in many cells (from 0 to 2, and hence also the total pT of patatracks associated to vertices from 0 to a value !=0 ). These differences might cause changes in outputs.

tracks_14_16.txt tracks_14_18.txt tracks_14_24.txt tracks_16_35.txt tracks_16_38.txt tracks_16_40.txt vertices_14_16.txt vertices_14_18.txt vertices_14_24.txt vertices_16_35.txt vertices_16_38.txt vertices_16_40.txt

VinInn commented 2 years ago

In the .txt file above I did not find any discrepancy in the track associated to vertices. They look identical to me (to the last digit).

silviodonato commented 2 years ago

In the .txt file above I did not find any discrepancy in the track associated to vertices. They look identical to me (to the last digit).

The number of tracks is different. In tracks_14_24.txt, there are 1113 tracks, while in tracks_14_16.txt 1109 tracks

652 0.545105    0.0965592   -2.13828    1   0.041834    3   6   -1
580 0.535333    0.0992664   -2.1305 1   2.38    4   6   -1
67  0.61013 2.64948 -2.10212    1   4.45669 4   6   -1
1267    5.55364 -2.56354    -1.97965    -1  9.96772 4   3   -1
861 5.91727 -3.13949    -1.66239    1   32.1911 3   3   -1

these tracks appear only in tracks_14_24.txt

valeriadamante commented 2 years ago

I take an example from 16_35 and 16_38 comparison: in 16_35 the number of this track is 419 and in 16_38 the index is 660 (I know the order is not important but I report here to allow you to find the information)

file 16_35 pt=1.27935, phi=2.59874, eta=-2.09418, charge=1, chi2=3.15117, nHits=4, quality=3, idv=-1

file 16_38 pt= 1.27935, phi= 2.59874, eta= -2.09418, charge=1, chi2=3.15117, nHits=4, quality=6, idv=-1

VinInn commented 2 years ago

Sorry. I understood you were referring to discrepancies in tracks associated to vertices.

indeed this quite isolated quadruplets with quality either loose or HP is strange. Fro a cursory look I have not found any other occurence.

trk 1267 seems a marginal quadruplet (nothing obvious around): true that is only found in one out of 6 reco-job, (found just one more quad present in 4 out of 6. all others are identical)

4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_16_40.txt:125
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_16_35.txt:15
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_14_24.txt:67
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_14_18.txt:72

for trk 558 it seems that it's swapped for a close by triplet (same hits most probably). This is not really expected when running on the same hardware.

0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_16_40.txt:561
0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_14_18.txt:305
0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_14_16.txt:580
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_16_38.txt:668
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_16_35.txt:683
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_14_24.txt:652

the others are low quality triplets included mostly for seeding. It is known that the current algorithm cannot reproduce all of them.

file parsed with eihter

grep ',' track* | tr ',' ' ' | awk '{print $3, $2, $9, $4,$5,$6,$7,$8, $1}' | sort -g -r | less

or

grep ',' track* | tr ',' ' ' | awk '{print $7, $3, $2, $9, $4,$5,$6,$7,$8, $1}' | sort -g -r | less

kandrosov commented 2 years ago

the others are low quality triplets included mostly for seeding. It is known that the current algorithm cannot reproduce all of them.

hm... this could be the reason for L2TauTagNN irreproducibility: currently all tracks that pass Loose quality WP and have > 0 hits are considered as inputs: TrackGood. @VinInn what (minimal) selection should be used for TrackGood to ensure reproducibility of the inputs with the current track building algorithm?

VinInn commented 2 years ago

hm... this could be the reason for L2TauTagNN irreproducibility: currently all tracks that pass Loose quality WP and have > 0 hits are considered as inputs: TrackGood. @VinInn what (minimal) selection should be used for TrackGood to ensure reproducibility of the inputs with the current track building algorithm? Full reproducibility is never guaranteed with current algorithm. I suggest to use a selection similar to PF in scouting as they made very detailed study

(same for track vertex association. currenly you are "counting" only the tracks used to identify and fit the vertices: high pt, high purity quadruplets)

v.

silviodonato commented 2 years ago

Just to quantify the GPU vs GPU fluctuation: out of 1110 tracks we get differences in 27 tracks (tracks_14_16.txt vs tracks_14_18.txt)

Specifically (using the grep command above: phi, pt, idv, eta, charge, chi2, nHits, quality):

1 additional track

                      > 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6

1 track with slightly different pt, phi, eta, and chi2

-2.53876 1.58212 -1 -1.71285 1 5.96387 4 6    | -2.53913 1.59199 -1 -1.72069 1 2.98019 4 6

1 track with a different quality

2.59874 1.27935 -1 -2.09418 1 3.15117 4 6     | 2.59874 1.27935 -1 -2.09418 1 3.15117 4 3

silviodonato commented 2 years ago

@valeriadamante do you know the meaning of idv=9997 ?

silviodonato commented 2 years ago

@valeriadamante what is the difference between vtx_idx and sortind in the vertex.txt file?

Could you confirm that in the L2NN you don't use sortind, vtx_idx, idv at all?

VinInn commented 2 years ago

You may wish to "print" and "use" in the CNN nLayers as well. It may be that some of those 4-hit tracks are just triplets (3-layers) with two hits in the same layer.

Something else you may wish to consider are the significance of the pt and impact-param (or even the chord) as it may help to put less weight on tracks with large errors.

VinInn commented 2 years ago

idv=9997 (9998-1) means that the track was used in the vertex finder but ended not "associated" to any vertex (search for 9998 in "RecoPixelVertexing/PixelVertexFinding")

silviodonato commented 2 years ago

@VinInn @fwyzard I noticed that the CPU vs GPU differences appear only after 300 - 500 events. Usually there are no difference in the first ~300 events. The "difference rate" reach a "plateau" after ~1000 events. Did you expect this?

valeriadamante commented 2 years ago

@valeriadamante what is the difference between vtx_idx and sortind in the vertex.txt file?

I checked, and the sortind column should be ignored. Indeed, vtx_idx is vertex_soa.sortInd[j] (where j runs over vertex_soa size) and sortInd is vertex_SOA.sortInd[vtx_idx]. So please ignore this column in the comparison.

Could you confirm that in the L2NN you don't use sortind, vtx_idx, idv at all?

Yes, in L2NN I only use the number of vertices that pass a minimal selection (described here )

VinInn commented 2 years ago

On 9 Mar, 2022, at 9:37 PM, Silvio Donato @.***> wrote:

@VinInn @fwyzard I noticed that the CPU vs GPU differences appear only after 300 - 500 events. Usually there are no difference in the first ~300 events. The "difference rate" reach a "plateau" after ~1000 events. Did you expect this?

No. is this in any workflow or some in particular?

silviodonato commented 2 years ago

No, I just run the L2NN in a different amount of events. You can find all the files in /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study:

CPU and CPU_2: done using CPU only
1 and 2: done using GPU (only for pixel local reco an tracks). I ran 2 times the same configuration
skim1k and skim1k_2: done using GPU (only for pixel local reco an tracks)
skim2k and skim2k_2: done using GPU (only for pixel local reco an tracks)

Using hltDiff -n CPU_1/output.root -o skip_2k/output.root -v1 | grep HLT_ you can easily see that there are no difference in the first 300-500 events.

I made a more quantitative comparison: number of cumulated differences vs the number of processed events:

chart (2)

silviodonato commented 2 years ago

If you count the differences in the reversed order (ie. starting from the last processed event), you get this plot chart (5)

It is clear that the "difference rate" is rather constant, but for some reason in there are no (or few) differences in the first hundreds of events

silviodonato commented 2 years ago

These are the exact numbers of the first events with differences:

Diff (number)	1	2	skim1k	skim1k_2	skim2k	skim2k_2
1	382	377	310	324	503	638
2	609	550	486	457	639	652
3	652	581	577	485	652	676
4	674	652	585	486	669	684
5	719	673	620	529	670	751

VinInn commented 2 years ago

is not that the first few hundreds events are sort of low multiplicity?

silviodonato commented 2 years ago

is not that the first few hundreds events are sort of low multiplicity?

no, because I tested this on different events. I mean the first 300 events of skim2k correspond to the events between 2000 - 2300 of 1, and from the plot you can see that we see differences in skim2k between 2000 - 2300

VinInn commented 2 years ago

ok. I can only imagine that at start up things goes more in sync so all jobs run code more or less in the same order (even if in parallel). Then they loose sync, the occupancy of the GPU varies and code (blocks, waves) are run in a different order in different jobs.

VinInn commented 2 years ago

We may try "cooperative groups" (once they are validated to be used in CMSSW) to see if they are more stable (as they allocate all gpu threads at once and allow global synchronization among them all).

silviodonato commented 2 years ago

The file /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root is the output of a test where I ran on /eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root (3151 events) the following steps:

using GPU (process HLTGPU) and selecting the events passing HLT_OnlyL2NN_v1, ie. two tau with hltL2TauTagNNProducer, DoubleTau > 0.4327 (176 events)
using CPU (process HLTCPU) and inverting the tau selection (19 events)
using again GPU (process HLTGPU2), no further selection (19 events)

Checking the GPU2 value you can see that very often HLTGPU2 has a result very close to HLTCPU rather then HLTGPU2 (as discussed above):

[sdonato@lxplus764 src]$ root -l /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root
root [0] 
Attaching file /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root as _file0...
(TFile *) 0x47dedc0
root [1] Events->Scan("EventAuxiliary.event():floats_hltL2TauTagNNProducer_DoubleTau_HLTGPU.obj:floats_hltL2TauTagNNProducer_DoubleTau_HLTCPU.obj:floats_hltL2TauTagNNProducer_DoubleTau_HLTGPU2.obj")
***********************************************************************
*    Row   * Instance * EventAuxi * floats_hl * floats_hl * floats_hl *
***********************************************************************
*        0 *        0 * 179817566 * 0.8923431 * 0.7772132 * 0.7772129 *
*        0 *        1 * 179817566 * 0.5937884 * 0.2117204 * 0.2117119 *
*        1 *        0 * 179298167 * 0.5923708 * 0.0160080 * 0.0160080 *
*        1 *        1 * 179298167 * 0.8136105 * 0.4281417 * 0.4281420 *
*        2 *        0 * 179791629 * 0.9079257 * 0.2867714 * 0.5548577 *
*        2 *        1 * 179791629 * 0.6653458 * 0.6105449 * 0.6105442 *
*        3 *        0 * 179864959 * 0.8531022 * 0.0435173 * 0.0435174 *

I used basically keep * so all the HLT objects (tracks, hits) are included, and you can even re-ran on the RAW event.

PS. The HLT command was

hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V6 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:aaa.root --output full --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets

silviodonato commented 2 years ago

I made an easy script to compare some variables https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU.py This is one random event with CPU/GPU differences:

['phi', 'eta', 'dz', 'dxy', 'pt', 'chi2', 'charge', 'missingInnerHits']
i distance diff CPU GPU
249      484.69      [2.09, -0.88, 9.23, -0.04, 3.41, 23.06, -1, 0]      [2.08, -0.88, 9.25, -0.0, 2.77, 1.06, -1, 0]    [0.01, 0.0, -0.02, -0.04, 0.64, 22.01, 0, 0]
333      51.02   [2.7, -1.27, 0.17, 0.01, 2634.47, 10.18, 1, 0]      [2.7, -1.27, 0.17, 0.01, 2627.33, 10.18, 1, 0]      [0.0, -0.0, -0.0, 0.0, 7.14, -0.0, 0, 0]
478      0.45    [-3.09, -2.0, -1.11, 0.08, 1.42, 5.51, -1, 0]   [-3.09, -2.0, -1.11, 0.08, 1.43, 4.84, -1, 0]   [-0.0, 0.0, -0.0, 0.0, -0.01, 0.67, 0, 0]
606      0.04    [-2.23, 0.95, -5.31, 0.22, 278.08, 7.04, 1, 0]      [-2.23, 0.95, -5.31, 0.22, 277.88, 7.04, 1, 0]      [-0.0, -0.0, 0.0, 0.0, 0.2, 0.0, 0, 0]
1017     149.91      [0.51, 2.54, -1.01, -0.09, 1.57, 40.44, -1, 0]      [0.51, 2.54, -0.99, -0.09, 1.57, 52.68, -1, 0]      [0.0, 0.0, -0.03, -0.0, 0.0, -12.24, 0, 0]
1032     0.01    [0.87, 2.23, 8.58, -0.13, 114.02, 47.08, -1, 0]     [0.87, 2.23, 8.58, -0.13, 113.91, 47.08, -1, 0]     [0.0, 0.0, 0.0, -0.0, 0.12, -0.0, 0, 0]
1081     32056115.03     [2.73, 1.76, 6.68, 0.17, 1.59, 2.08, -1, 0]     [-2000, -2000, -2000, -2000, -2000, -2000, -2000, -2000]    [2002.73, 2001.76, 2006.68, 2000.17, 2001.59, 2002.08, 1999, 2000]
1180     8184.6      [0.91, -2.62, 1.91, -0.05, 2.08, 107.62, -1, 0]     [0.9, -2.61, 1.88, -0.02, 1.44, 17.15, -1, 0]   [0.01, -0.0, 0.03, -0.03, 0.64, 90.47, 0, 0]
1254     86.52   [2.81, -2.05, -1.18, 0.04, 0.81, 12.9, 1, 0]    [2.81, -2.05, -1.18, 0.03, 0.8, 3.6, 1, 0]      [-0.0, 0.0, -0.0, 0.0, 0.01, 9.3, 0, 0]
1262     32024193.35     [3.13, -1.62, -3.54, 0.17, 0.95, 7.95, -1, 0]   [-2000, -2000, -2000, -2000, -2000, -2000, -2000, -2000]    [2003.13, 1998.38, 1996.46, 2000.17, 2000.95, 2007.95, 1999, 2000]
1305     783.71      [-2.29, -1.76, -3.8, 0.24, 0.63, 0.3, -1, 0]    [-2.29, -1.76, -3.8, 0.24, 0.63, 28.3, -1, 0]   [-0.0, 0.0, -0.0, 0.0, -0.0, -27.99, 0, 0]
1402     48.09   [-0.22, -2.18, 1.89, -0.05, 2539.61, 18.99, 1, 0]   [-0.22, -2.18, 1.89, -0.05, 2546.54, 18.99, 1, 0]   [0.0, 0.0, -0.0, -0.0, -6.93, 0.0, 0, 0]
1471     38.81   [2.03, 1.3, -2.18, 0.17, 1.35, 27.83, 1, 0]     [2.03, 1.31, -2.18, 0.18, 1.37, 21.6, 1, 0]     [0.0, -0.0, -0.0, -0.0, -0.02, 6.23, 0, 0]
1525     1.91    [1.98, 2.12, -2.84, -0.16, 0.62, 2.37, -1, 0]   [1.98, 2.12, -2.84, -0.16, 0.62, 3.75, -1, 0]   [-0.0, 0.0, -0.0, 0.0, -0.0, -1.38, 0, 0]
1549     0.02    [0.46, 2.15, -2.56, -0.11, 0.84, 0.58, -1, 0]   [0.45, 2.15, -2.57, -0.09, 0.81, 0.72, -1, 0]   [0.0, -0.0, 0.01, -0.02, 0.02, -0.13, 0, 0]
1563     0.22    [1.95, -2.02, -2.84, -0.15, 0.63, 1.48, -1, 0]      [1.95, -2.02, -2.83, -0.16, 0.63, 1.01, -1, 0]      [-0.0, 0.0, -0.01, 0.01, -0.0, 0.47, 0, 0]
1945     1406.1      [-0.33, 2.34, -5.6, 0.06, 1.54, 48.7, 1, 0]     [0.01, 1.89, -1.89, -0.01, 1.11, 11.39, 1, 0]   [-0.35, 0.45, -3.71, 0.07, 0.43, 37.31, 0, 0]
1973     3.53    [-3.08, -1.75, -10.01, 0.03, 1.51, 12.12, -1, 0]    [-3.09, -1.79, -8.41, 0.05, 1.46, 13.11, -1, 0]     [0.0, 0.04, -1.6, -0.02, 0.05, -0.98, 0, 0]

Two considerations:

it looks like that difference are more often in the last tracks (whatever it means)
an important effect to be checked are the tracks with a huge pt and an even larger pt error. These tracks might have large significant fluctuation in pt, and this might affect the reconstruction (even if reco should be robust against this effect)

silviodonato commented 2 years ago

And this is the comparison of the pixel cluster: https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU_pixelHits.py

Considering the first event, there are a lot of clusters (~30) with a cluster charge difference equal to 1:

['x', 'y', 'charge', 'colSpan', 'size', 'sizeX', 'sizeY', 'minPixelCol', 'maxPixelCol', 'minPixelRow', 'maxPixelRow', 'overflow', 'overflowCol', 'overflowRow', 'colSpan', 'rowSpan']

detId: 303050780     cluster: 14     diff: 1.0
cpu: [63.56, 205.13, 99173, 9, 11, 2, 10, 199, 208, 63, 64, 0, 0, 0, 9, 1]
gpu: [63.56, 205.13, 99174, 9, 11, 2, 10, 199, 208, 63, 64, 0, 0, 0, 9, 1]
dif: [0.0, -0.0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

There are 3 clusters with a large difference in the cluster charge. In the second case there is a +140%, while the coordinate x,y are exactly the same. In all three cases there is a change in the size, while sizeX and sizeY are unchanged (I guess this means that they are missing one pixel)

['x', 'y', 'charge', 'colSpan', 'size', 'sizeX', 'sizeY', 'minPixelCol', 'maxPixelCol', 'minPixelRow', 'maxPixelRow', 'overflow', 'overflowCol', 'overflowRow', 'colSpan', 'rowSpan']

detId: 303054852     cluster: 35     diff: 142277185.01
cpu: [54.42, 288.56, 98098, 10, 11, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
gpu: [54.43, 288.44, 110026, 10, 12, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
dif: [-0.01, 0.11, -11928, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

detId: 303054852     cluster: 39     diff: 10093330.0
cpu: [56.5, 279.5, 2194, 0, 1, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
gpu: [56.5, 279.5, 5371, 0, 2, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -3177, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

detId: 303067152     cluster: 36     diff: 22543505.0
cpu: [95.5, 367.5, 7109, 0, 1, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
gpu: [95.5, 367.5, 11857, 0, 2, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -4748, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

silviodonato commented 2 years ago

Checking the single pixels associated to the clusters, I noticed that the problematic clusters have often two pixel with the same coordinates. Typically one pixel as the same value of the CPU cluster, while the other pixel is random. @tsusa

detId: 303054852     cluster: 35     diff: 142277185.01
cpu: [54.42, 288.56, 98098, 10, 11, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
gpu: [54.43, 288.44, 110026, 10, 12, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
dif: [-0.01, 0.11, -11928, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303054852     cluster: 39     diff: 10093330.0
cpu: [56.5, 279.5, 2194, 0, 1, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
gpu: [56.5, 279.5, 5371, 0, 2, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -3177, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303067152     cluster: 36     diff: 22543505.0
cpu: [95.5, 367.5, 7109, 0, 1, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
gpu: [95.5, 367.5, 11857, 0, 2, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -4748, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
clusterDebug= {303054852: [35, 39], 303067152: [36]}

DetId: 303054852  clNum: 35
      x  y  adc
diff:  [0, 0, 0]  cpu:  [53, 294, 5829]  gpu:  [53, 294, 5829]
diff:  [0, 0, 0]  cpu:  [53, 293, 1747]  gpu:  [53, 293, 1747]
diff:  [0, 0, 0]  cpu:  [53, 292, 100]  gpu:  [53, 292, 100]
diff:  [0, 0, 0]  cpu:  [54, 291, 1446]  gpu:  [54, 291, 1446]
diff:  [0, 0, 0]  cpu:  [54, 290, 18456]  gpu:  [54, 290, 18456]
diff:  [0, 0, 0]  cpu:  [54, 289, 4970]  gpu:  [54, 289, 4970]
diff:  [0, 0, 0]  cpu:  [54, 288, 18046]  gpu:  [54, 288, 18046]
diff:  [0, 0, 11669]  cpu:  [54, 287, 23597]  gpu:  [54, 287, 11928]
diff:  [0, 0, 0]  cpu:  [54, 286, 10775]  gpu:  [54, 286, 10775]
diff:  [0, 0, 0]  cpu:  [54, 285, 13032]  gpu:  [54, 285, 13032]
diff:  [0, 0, 0]  cpu:  [54, 284, 100]  gpu:  [54, 284, 100]
diff:  [-2054, -2287, -25597]  cpu:  [-2000, -2000, -2000]  gpu:  [54, 287, 23597]

DetId: 303054852  clNum: 39
      x  y  adc
diff:  [0, 0, 0]  cpu:  [56, 279, 2194]  gpu:  [56, 279, 2194]
diff:  [-2056, -2279, -5177]  cpu:  [-2000, -2000, -2000]  gpu:  [56, 279, 3177]

DetId: 303067152  clNum: 36
      x  y  adc
diff:  [0, 0, 0]  cpu:  [95, 367, 7109]  gpu:  [95, 367, 7109]
diff:  [-2095, -2367, -6748]  cpu:  [-2000, -2000, -2000]  gpu:  [95, 367, 4748]

silviodonato commented 2 years ago

This is the number of clusters containing duplicated pixels in each event:

event = 179817566  duplicates = 1
event = 179298167  duplicates = 3
event = 179791629  duplicates = 1
event = 179864959  duplicates = 1
event = 179874601  duplicates = 0
event = 179479449  duplicates = 2
event = 179064864  duplicates = 5
event = 179118965  duplicates = 1
event = 178787468  duplicates = 3
event = 180607223  duplicates = 3
event = 180699610  duplicates = 0
event = 181039937  duplicates = 1
event = 181330428  duplicates = 0
event = 181312720  duplicates = 1
event = 181451699  duplicates = 10
event = 180470519  duplicates = 2
event = 181735245  duplicates = 1
event = 181859892  duplicates = 4
event = 181517358  duplicates = 3

all these duplicates comes from the GPU. There are 0 duplicates in the CPU reconstruction. https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU_pixelHits_findDuplicates.py

I cannot say this is the cause of the GPU fluctuations, but I think this is a bug in the pixel local reco.

VinInn commented 2 years ago

Checking the single pixels associated to the clusters, I noticed that the problematic clusters have often two pixel with the same coordinates. Typically one pixel as the same value of the CPU cluster, while the other pixel is random. @tsusa

This is a known issue for real data. Apparently time to time some pixels from the previous crossing are still around. So it may happen to find twice the same pixels in the raw data. On GPU they stay separate, on CPU their charge is summed. On monte carlo of course this never happen.

fwyzard commented 2 years ago

I assume we don't have any way of figuring out which ones are from the current event and which ones are from the previous one ?

silviodonato commented 2 years ago

In these case the duplicated pixels are not summed on CPU:

DetId: 303054852 clNum: 39 x y adc diff: [0, 0, 0] cpu: [56, 279, 2194] gpu: [56, 279, 2194] diff: [-2056, -2279, -5177] cpu: [-2000, -2000, -2000] gpu: [56, 279, 3177]

DetId: 303067152 clNum: 36 x y adc diff: [0, 0, 0] cpu: [95, 367, 7109] gpu: [95, 367, 7109] diff: [-2095, -2367, -6748] cpu: [-2000, -2000, -2000] gpu: [95, 367, 4748]

(-2000 means missing pixel)

In DetId: 303054852 clNum: 39,

cpu made a cluster using a single pixel in 56, 279 with charge 2194
gpu made a cluster using two pixels, both of them in 56, 279 with charge 2194 and 3177

In DetId: 303067152 clNum: 36,

cpu made a cluster using a single pixel in 95, 367 with charge 7109
gpu made a cluster using two pixels, both of them in 95, 367 with charge 7109 and 4748

If the CPU sums up the duplicated pixels and the GPU keep them separated, the total cluster charge should not change. On contrary, I do see a different cluster charge for

detId: 303054852 cluster: 39 (cpu: 2194 vs gpu: 5371)
detId: 303067152 cluster: 36 (cpu: 7109 vs gpu: 11857)

VinInn commented 2 years ago

sorry I got confused by the code in the second copy_to_buffer (where pixels are added)

the line to fill the buffer later used by make_clusters is https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#294 and indeed it is set not add. So last pixel in raw-data wins.

silviodonato commented 2 years ago

Below some more numbers about duplicated pixels. I've found also events with three duplicated pixels

### Three pixels with the same (x,y) in  (150, 208) 303087620 179064864 . The third pixel is:
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. charge3 = 13019 
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. chargeCPU = 13019

and events with duplicated pixels in GPU with no corresponding pixel on CPU

detId = 303042568. x, y = 95, 291. charge1 = 23386. charge2 = 100. No corresponding CPU cluster found.

(probably because the CPU pick 100 and then the charge is to small to make a cluster. There is only one case where the smaller value of the two charge is 758 instead of 100). This seems also to explain the reason why we see a different number of clusters.


Event = 179817566
### Different number of pixel cluster in detId=303075348 32 vs 33
detId = 303075348. x, y = 143, 156. charge1 = 28972. charge2 = 100. No corresponding CPU cluster found. 
Summary: event = 179817566  duplicates = 1

Event = 179298167
detId = 303054852. x, y = 54, 287. charge1 = 11928. charge2 = 23597. chargeCPU = 23597 
detId = 303054852. x, y = 56, 279. charge1 = 2194. charge2 = 3177. chargeCPU = 2194 
detId = 303067152. x, y = 95, 367. charge1 = 7109. charge2 = 4748. chargeCPU = 7109 
Summary: event = 179298167  duplicates = 3

Event = 179791629
detId = 353130500. x, y = 115, 252. charge1 = 10537. charge2 = 132. chargeCPU = 132 
Summary: event = 179791629  duplicates = 1

Event = 179864959
### Different number of pixel cluster in detId=303042568 25 vs 26
detId = 303042568. x, y = 152, 208. charge1 = 6319. charge2 = 100. No corresponding CPU cluster found. 
Summary: event = 179864959  duplicates = 1

Event = 179874601
Summary: event = 179874601  duplicates = 0

Event = 179479449
detId = 304156696. x, y = 133, 304. charge1 = 11391. charge2 = 29045. chargeCPU = 11391 
detId = 304156696. x, y = 134, 305. charge1 = 100. charge2 = 100. chargeCPU = 100 
Summary: event = 179479449  duplicates = 2

Event = 179064864
detId = 303071256. x, y = 101, 40. charge1 = 19369. charge2 = 17041. chargeCPU = 17041 
### Three pixels with the same (x,y) in  (150, 208) 303087620 179064864 . The third pixel is:
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. charge3 = 13019 
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. chargeCPU = 13019 
### Three pixels with the same (x,y) in  (114, 272) 304156696 179064864 . The third pixel is:
detId = 304156696. x, y = 114, 272. charge1 = 10836. charge2 = 693. charge3 = 5517 
detId = 304156696. x, y = 114, 272. charge1 = 10836. charge2 = 693. chargeCPU = 10836 
Summary: event = 179064864  duplicates = 3

Event = 179118965
### Different number of pixel cluster in detId=353130500 4 vs 5
detId = 353130500. x, y = 114, 258. charge1 = 6724. charge2 = 758. No corresponding CPU cluster found. 
Summary: event = 179118965  duplicates = 1

Event = 178787468
### Different number of pixel cluster in detId=303042568 65 vs 66
detId = 303042568. x, y = 88, 291. charge1 = 100. charge2 = 11937. chargeCPU = 100 
detId = 303042568. x, y = 159, 208. charge1 = 100. charge2 = 8680. chargeCPU = 8680 
detId = 303042568. x, y = 95, 291. charge1 = 23386. charge2 = 100. No corresponding CPU cluster found. 
Summary: event = 178787468  duplicates = 3

Event = 180607223
detId = 303063072. x, y = 155, 77. charge1 = 8872. charge2 = 5421. chargeCPU = 8872 
### Different number of pixel cluster in detId=303075348 53 vs 54
detId = 303075348. x, y = 159, 204. charge1 = 28289. charge2 = 100. No corresponding CPU cluster found. 
detId = 303087628. x, y = 151, 276. charge1 = 11057. charge2 = 1501. chargeCPU = 1501 
Summary: event = 180607223  duplicates = 3

Event = 180699610
Summary: event = 180699610  duplicates = 0

Event = 181039937
### Different number of pixel cluster in detId=353077252 5 vs 4
detId = 353130500. x, y = 127, 384. charge1 = 30295. charge2 = 11334. chargeCPU = 30295 
Summary: event = 181039937  duplicates = 1

Event = 181330428
Summary: event = 181330428  duplicates = 0

Event = 181312720
detId = 303075360. x, y = 0, 27. charge1 = 31417. charge2 = 100. chargeCPU = 31417 
Summary: event = 181312720  duplicates = 1

Event = 181451699
### Different number of pixel cluster in detId=303042568 68 vs 70
### Three pixels with the same (x,y) in  (159, 260) 303042568 181451699 . The third pixel is:
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. charge3 = 100 
### Three pixels with the same (x,y) in  (159, 260) 303042568 181451699 . The third pixel is:
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. charge3 = 7472 
detId = 303042568. x, y = 116, 262. charge1 = 17266. charge2 = 6187. chargeCPU = 6187 
detId = 303042568. x, y = 159, 268. charge1 = 11393. charge2 = 100. chargeCPU = 11393 
detId = 303042568. x, y = 159, 292. charge1 = 22019. charge2 = 100. No corresponding CPU cluster found. 
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. No corresponding CPU cluster found. 
### Three pixels with the same (x,y) in  (159, 216) 303087620 181451699 . The third pixel is:
detId = 303087620. x, y = 159, 216. charge1 = 100. charge2 = 7297. charge3 = 21456 
detId = 303087620. x, y = 158, 216. charge1 = 8340. charge2 = 9980. chargeCPU = 8340 
detId = 303087620. x, y = 157, 216. charge1 = 100. charge2 = 100. chargeCPU = 100 
detId = 303087620. x, y = 159, 216. charge1 = 100. charge2 = 7297. chargeCPU = 21456 
Summary: event = 181451699  duplicates = 7

Event = 180470519
detId = 303042568. x, y = 103, 276. charge1 = 100. charge2 = 4378. chargeCPU = 4378 
detId = 303087620. x, y = 156, 208. charge1 = 20890. charge2 = 11203. chargeCPU = 20890 
Summary: event = 180470519  duplicates = 2

Event = 181735245
detId = 303067152. x, y = 135, 404. charge1 = 35575. charge2 = 5473. chargeCPU = 35575 
Summary: event = 181735245  duplicates = 1

Event = 181859892
### Different number of pixel cluster in detId=304156696 16 vs 17
detId = 304156696. x, y = 84, 344. charge1 = 5096. charge2 = 100. No corresponding CPU cluster found. 
detId = 344823812. x, y = 49, 85. charge1 = 5765. charge2 = 4385. chargeCPU = 5765 
detId = 344823812. x, y = 68, 83. charge1 = 3022. charge2 = 4296. chargeCPU = 4296 
### Different number of pixel cluster in detId=353130500 16 vs 17
detId = 353130500. x, y = 44, 269. charge1 = 100. charge2 = 7159. No corresponding CPU cluster found. 
Summary: event = 181859892  duplicates = 4

Event = 181517358
detId = 303075360. x, y = 0, 35. charge1 = 18866. charge2 = 100. chargeCPU = 18866 
detId = 303087628. x, y = 101, 276. charge1 = 9862. charge2 = 3492. chargeCPU = 3492 
detId = 304156696. x, y = 133, 278. charge1 = 2531. charge2 = 29626. chargeCPU = 29626 
Summary: event = 181517358  duplicates = 3

silviodonato commented 2 years ago

sorry I got confused by the code in the second copy_to_buffer (where pixels are added)

the line to fill the buffer later used by make_clusters is https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#294 and indeed it is set not add. So last pixel in raw-data wins.

Thanks, is it possible to apply the same rule to both CPU and GPU?

VinInn commented 2 years ago

On 13 Mar, 2022, at 1:53 PM, Silvio Donato @.***> wrote:

Thanks, is it possible to apply the same rule to both CPU and GPU? on GPU is difficult to select the last occurence. For the CPU I leave Pixel-DPG to comment. In any case I'm not sure if anybody know what is more correct.

In my opinion this is the less relevant of the difference: at the level of the anavoidable differences arising in FP operations

I think we need to decide if we accept different results or not from different architecture in general. Small mods in algo here and there are not a solution.

v.

silviodonato commented 2 years ago

I agree that probably this is not the main cause of the CPU/GPU differences that we observe, but anyway it is a clear different behavior between the two algorithms that need be uniformed either in the CPU or in the GPU code.

I think we need to decide if we accept different results or not from different architecture in general.

I understand that it is not feasible to have no differences, but we should really try to understand and to reduce that as much as possible. I think it is very difficult to accept differences above >5% (in HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1 they are above 20%).

Small mods in algo here and there are not a solution.

If I understand correctly, once we fix this issue of the repeated pixels, we can finally exclude that the differences comes from the pixel local reco.

silviodonato commented 2 years ago

Btw. the other (minor) difference is a difference of a 1 in the cluster charge. I've seen that it comes directly from the pixel.adc which sometimes differs of 1 between CPU and GPU. Is it a known problem? (floating point error?)

VinInn commented 2 years ago

You can try to change set in add in the line I quoted above (on CPU).

VinInn commented 2 years ago

About the difference in adc values:

this is the CPU code for calibration https://cmssdt.cern.ch/dxr/CMSSW/source/CalibTracker/SiPixelESProducers/src/SiPixelGainCalibrationService.cc#31 and this one for GPU https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/gpuCalibPixel.h#66 there is clearly opportunity for fma (not available in standard CMSSW build for CPU)

One can force the same fma on both (but with standard cmssw that will expensive on CPU as it will not use the hardware one).

VinInn commented 2 years ago

I am sure that DeepTau can be made more reproducible if one applies pixel-track selections closer to those used for instance in PF for scouting.

silviodonato commented 2 years ago

You can try to change set in add in the line I quoted above (on CPU).

Yes, it works. Now the total cluster charge matches within 10-20 adc, apart from a few events where we reconstruct a different number of clusters.

(and of course the "size" variable of the cluster is still different)

cms-sw / cmssw