Open fwyzard opened 3 years ago
Can you please post the receipt for reproducing it? Is it possible to identify culprit events?
Can you please post the receipt for reproducing it? Is it possible to identify culprit events?
Ok, I asked Ganesh to send me the ROOT files with the trigger results and then I will make a skim of the culprit events.
Btw. A fix of #35668 would be very useful to understand if the differences comes from the pixel local reco or from the pixel tracking.
I investigated a bit why HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1
had such large differences and I noticed that they come from the L2NN cut (ie. the preliminary tau-tagging done using the pixel tracks).
I made an HLT path cutting only the L2NN and then I see even larger GPU-GPU fluctuations (~30%). You can easily reproduce this by using
$ hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root --output minimal --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets,HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input > hlt.py
[... I've increased the number of threads...]
$ CUDA_DEVICES=0 cmsRun hlt.py >& log &
$ mv output.root output_2.root
$ CUDA_DEVICES=0 cmsRun hlt.py >& log &
$ hltDiff -o output.root -n output_2.root
Found 3151 matching events, out of which 57 have different HLT results
Events Accepted Gained Lost Other Trigger
3151 171 +31 -26 - HLT_OnlyL2NN_v1
Using the -v 1
option you can see which events have changed.
After 4 attempts, these events changed (first ten events):
1 vs 2
179372613
179860017
179012871
179758644
179935322
179565779
179798462
179380337
179390087
179137434
1 vs 3
179372613
179012871
179565779
179798462
179380337
179390087
179137434
179429943
179294636
179298167
178989134
1 vs 4
179372613
179012871
179565779
179798462
179380337
179390087
179137434
179429943
179176748
178989134
(run 323775, lumi 138 of /eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
)
I made a quick check with a Run3 RelVal
hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag 123X_mcRun3_2021_realistic_v6 --data --eras Run3 --max-events -1 --input file:/eos/cms/store/relval/CMSSW_12_3_0_pre5/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/123X_mcRun3_2021_realistic_v6-v1/10000/83efc2d4-c2e1-4aa9-af7d-832ff76e29dd.root --output minimal --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets > hltRelVals.py
getting
[sdonato@cms-hlt-gpu src]$ hltDiff -o RelVal_1/output.root -n RelVal_2/output.root -v2
Processed events: 0 out of 900 (0%)
Processed events: 90 out of 900 (10%)
Processed events: 180 out of 900 (20%)
Processed events: 270 out of 900 (30%)
Processed events: 360 out of 900 (40%)
run 1, lumi 87, event 8634: old result is 'accepted', new result is 'accepted'
Path HLT_OnlyL2NN_v1:
old state is 'rejected' by module 19 'hltL2DoubleTauTagNNFilter' [L2TauTagFilter],
new state is 'accepted'
Filter hltL2DoubleTauTagNNFilter:
old trigger candidates:
filter id: 0, object id: 0, pT: 80.5, eta: -0.087, phi: 0.435, mass: 0
new trigger candidates:
filter id: 0, object id: 0, pT: 80.5, eta: -0.087, phi: 0.435, mass: 0
filter id: 1, object id: 0, pT: 39.5, eta: 0.87, phi: -0.280186, mass: 0
Processed events: 450 out of 900 (50%)
Processed events: 540 out of 900 (60%)
Processed events: 630 out of 900 (70%)
Processed events: 720 out of 900 (80%)
run 1, lumi 88, event 8728: old result is 'accepted', new result is 'accepted'
Path HLT_OnlyL2NN_v1:
old state is 'accepted',
new state is 'rejected' by module 19 'hltL2DoubleTauTagNNFilter' [L2TauTagFilter]
Filter hltL2DoubleTauTagNNFilter:
old trigger candidates:
filter id: 0, object id: 0, pT: 255.5, eta: -0.261, phi: -0.715185, mass: 0
filter id: 1, object id: 0, pT: 89.5, eta: 0.261, phi: -1.06319, mass: 0
new trigger candidates:
filter id: 0, object id: 0, pT: 255.5, eta: -0.261, phi: -0.715185, mass: 0
Processed events: 810 out of 900 (90%)
Found 900 matching events, out of which 2 have different HLT results
Events Accepted Gained Lost Other Trigger
900 35 +1 -1 - HLT_OnlyL2NN_v1
I investigated a bit why HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1 had such large differences and I noticed that they come from the L2NN cut (ie. the preliminary tau-tagging done using the pixel tracks).
I made an HLT path cutting only the L2NN and then I see even larger GPU-GPU fluctuations (~30%).
I suppose this NN was trained on some old version of Puxel Tracks (quadruplets) Maybe would be worth retraining,,, (and apply some selection to the input)
v.
Is the NN sensitive to the order of the tracks ?
Hello,
The training has been done on triplets, not on quadruplets, and the NN should not be sensitive to the order of the tracks: the patatrack-related inputs are sum of kinematic observables normalised to the Pt sum, and the total number of tracks in the specific cell as specified here in Slide 7.
Valeria
I asked Valeria to comment here about the tau L2NN. I think that the training was done with Triplets. No idea about the order of tracks.
Meanwhile I tried to store the objects using keep *
(in CMSSW_12_3_X_2022-03-03-1100
), and I see no Vertex/Track inside (even if the edmEventSize says that they are stored)
root [10] Events->Scan("ushorts_hltPixelTracks__AAA.@obj.size():recoTracks_hltPixelTracks__AAA.obj.pt():recoTracks_hltPixelTracks__AAA.@obj.size():floats_hltL2TauTagNNProducer_SingleTau_AAA.@obj.size():floats_hltL2TauTagNNProducer_SingleTau_AAA.obj.","floats_hltL2TauTagNNProducer_SingleTau_AAA.@obj.size()>0")
***********************************************************************************
* Row * Instance * ushorts_h * recoTrack * recoTrack * floats_hl * floats_hl *
***********************************************************************************
* 79 * 0 * 0 * * 0 * 1 * 0.0425582 *
* 81 * 0 * 0 * * 0 * 1 * 0.1661431 *
* 83 * 0 * 0 * * 0 * 1 * 0.0424410 *
* 88 * 0 * 0 * * 0 * 2 * 0.2396616 *
* 88 * 1 * 0 * * 0 * 2 * 0.0305398 *
I tried again running with no filters (--open
) and using directly --output full
,
hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V3 --globaltag 123X_mcRun3_2021_realistic_v6 --data --eras Run3 --max-events -1 --input file:/eos/cms/store/relval/CMSSW_12_3_0_pre5/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/123X_mcRun3_2021_realistic_v6-v1/10000/83efc2d4-c2e1-4aa9-af7d-832ff76e29dd.root --output full --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets --open --process MYHLT > hlt.py
and I got
root [2] Events->Scan("recoTracks_hltPixelTracks__AAA.obj.pt()")
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:7:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: could not acquire lock file for module 'cuda': failed to create unique file /cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm.lock-dc5b9d8f: Read-only file system [-Rmodule-build]
#include <cuda_runtime.h>
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: building module 'cuda' as '/cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm' [-Rmodule-build]
error: unable to open output file '/cvmfs/cms-ib.cern.ch/nweek-02722/slc7_amd64_gcc10/lcg/root/6.24.07-f52350f4e0b802edeb9a2551a7d00b92/lib/cuda.pcm': 'Read-only file system'
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: remark: finished building module 'cuda' [-Rmodule-build]
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:7:10: fatal error: could not build module 'cuda'
#include <cuda_runtime.h>
~~~~~~~~^
Error in <TInterpreter::AutoParse>: Error parsing payload code for class hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > with content:
#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"
#ifndef CMS_DICT_IMPL
#define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
#define _REENTRANT 1
#endif
#ifndef GNUSOURCE
#define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
#define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
#define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
#define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
#define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
#define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
#define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
#define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
#define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
#define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
#define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
#define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
#define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
#define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
#define CMSSW_REFLEX_DICT 1
#endif
#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"
#undef _BACKWARD_BACKWARD_WARNING_H
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:7:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:14:67: error: use of undeclared identifier 'cudaStream_t'
using SharedStreamPtr = std::shared_ptr<std::remove_pointer_t<cudaStream_t>>;
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedStreamPtr.h:14:81: error: expected a type
using SharedStreamPtr = std::shared_ptr<std::remove_pointer_t<cudaStream_t>>;
^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:8:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedEventPtr.h:14:66: error: use of undeclared identifier 'cudaEvent_t'
using SharedEventPtr = std::shared_ptr<std::remove_pointer_t<cudaEvent_t>>;
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/SharedEventPtr.h:14:79: error: expected a type
using SharedEventPtr = std::shared_ptr<std::remove_pointer_t<cudaEvent_t>>;
^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:6:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:49:7: error: unknown type name 'cudaStream_t'
cudaStream_t stream() const { return stream_.get(); }
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:55:7: error: unknown type name 'cudaEvent_t'
cudaEvent_t event() const { return event_.get(); }
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:58:40: error: unknown type name 'SharedStreamPtr'
explicit ProductBase(int device, SharedStreamPtr stream, SharedEventPtr event)
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:58:64: error: unknown type name 'SharedEventPtr'
explicit ProductBase(int device, SharedStreamPtr stream, SharedEventPtr event)
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:66:13: error: unknown type name 'SharedStreamPtr'
const SharedStreamPtr& streamPtr() const { return stream_; }
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:78:7: error: unknown type name 'SharedStreamPtr'
SharedStreamPtr stream_; //!
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/ProductBase.h:80:7: error: unknown type name 'SharedEventPtr'
SharedEventPtr event_; //!
^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:48:36: error: unknown type name 'SharedStreamPtr'
explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, T data)
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:48:60: error: unknown type name 'SharedEventPtr'
explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, T data)
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:52:36: error: unknown type name 'SharedStreamPtr'
explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, Args&&... args)
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/Common/interface/Product.h:52:60: error: unknown type name 'SharedEventPtr'
explicit Product(int device, SharedStreamPtr stream, SharedEventPtr event, Args&&... args)
^
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:60:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h:6:
In file included from /cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/CUDADataFormats/CaloCommon/interface/Common.h:6:
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:15:17: error: unknown type name 'cudaError_t'
bad_alloc(cudaError_t error) noexcept : error_(error) {}
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:20:7: error: unknown type name 'cudaError_t'
cudaError_t error_;
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:23:48: error: use of undeclared identifier 'cudaHostAllocDefault'
template <typename T, unsigned int FLAGS = cudaHostAllocDefault>
^
/cvmfs/cms-ib.cern.ch/week0/slc7_amd64_gcc10/cms/cmssw-patch/CMSSW_12_3_X_2022-03-03-1100/src/HeterogeneousCore/CUDAUtilities/interface/HostAllocator.h:36:9: error: unknown type name 'cudaError_t'
cudaError_t status = cudaMallocHost(&ptr, n * sizeof(T), FLAGS);
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
Error in <TInterpreter::AutoParse>: Error parsing payload code for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > with content:
#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"
#ifndef CMS_DICT_IMPL
#define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
#define _REENTRANT 1
#endif
#ifndef GNUSOURCE
#define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
#define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
#define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
#define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
#define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
#define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
#define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
#define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
#define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
#define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
#define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
#define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
#define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
#define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
#define CMSSW_REFLEX_DICT 1
#endif
#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"
#undef _BACKWARD_BACKWARD_WARNING_H
Error in <TInterpreter::AutoParse>: Error parsing payload code for class hcal::RecHitCollection with content:
#line 1 "CUDADataFormatsHcalRecHitSoA_xr dictionary payload"
#ifndef CMS_DICT_IMPL
#define CMS_DICT_IMPL 1
#endif
#ifndef _REENTRANT
#define _REENTRANT 1
#endif
#ifndef GNUSOURCE
#define GNUSOURCE 1
#endif
#ifndef __STRICT_ANSI__
#define __STRICT_ANSI__ 1
#endif
#ifndef GNU_GCC
#define GNU_GCC 1
#endif
#ifndef _GNU_SOURCE
#define _GNU_SOURCE 1
#endif
#ifndef EIGEN_DONT_PARALLELIZE
#define EIGEN_DONT_PARALLELIZE 1
#endif
#ifndef TBB_USE_GLIBCXX_VERSION
#define TBB_USE_GLIBCXX_VERSION 100300
#endif
#ifndef TBB_SUPPRESS_DEPRECATED_MESSAGES
#define TBB_SUPPRESS_DEPRECATED_MESSAGES 1
#endif
#ifndef TBB_PREVIEW_RESUMABLE_TASKS
#define TBB_PREVIEW_RESUMABLE_TASKS 1
#endif
#ifndef BOOST_SPIRIT_THREADSAFE
#define BOOST_SPIRIT_THREADSAFE 1
#endif
#ifndef PHOENIX_THREADSAFE
#define PHOENIX_THREADSAFE 1
#endif
#ifndef BOOST_MATH_DISABLE_STD_FPCLASSIFY
#define BOOST_MATH_DISABLE_STD_FPCLASSIFY 1
#endif
#ifndef BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX
#define BOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX 1
#endif
#ifndef CMSSW_GIT_HASH
#define CMSSW_GIT_HASH "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef PROJECT_NAME
#define PROJECT_NAME "CMSSW"
#endif
#ifndef PROJECT_VERSION
#define PROJECT_VERSION "CMSSW_12_3_X_2022-03-02-2300"
#endif
#ifndef CMSSW_REFLEX_DICT
#define CMSSW_REFLEX_DICT 1
#endif
#define _BACKWARD_BACKWARD_WARNING_H
// Inline headers
#include "CUDADataFormats/Common/interface/Product.h"
#include "CUDADataFormats/HcalRecHitSoA/interface/RecHitCollection.h"
#include "DataFormats/Common/interface/Wrapper.h"
#undef _BACKWARD_BACKWARD_WARNING_H
Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.
Error in <TTreeFormula::Compile>: Bad numerical expression : "recoTracks_hltPixelTracks__AAA.obj.pt()"
************************
* Row * recoTrack *
************************
* 0 * *
* 1 * *
* 2 * *
* 3 * *
* 4 * *
* 5 * *
* 6 * *
* 7 * *
* 8 * *
* 9 * *
************************
On 4 Mar, 2022, at 12:43 PM, valeriadamante @.***> wrote:
Hello,
The training has been done on triplets, not on quadruplets, and the NN should not be sensitive to the order of the tracks: the patatrack-related inputs are sum of kinematic observables normalised to the Pt sum, and the total number of tracks in the specific cell as specified here in Slide 7.
I think it would be useful to look into the events identified by Silvio and try to indentify what makes the NN to produce a different output for the same event when run multiple times
v.
root [2] Events->Scan("recoTracks_hltPixelTracks__AAA.obj.pt()")
Reading std::vector<Track>
leading to
In file included from CUDADataFormatsHcalRecHitSoA_xr dictionary payload:59: .... Error in <TClass::LoadClassInfo>: no interpreter information for class edm::Wrapper<hcal::RecHitCollection<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias> > > is available even though it has a TClass initialization routine.
is strange. There should be no dependence on CUDADataFormats
from DataFormats
.
Hi all,
Running multiple times on the same event, I found differences in:
So what I did is:
1) Running from CMSSW_12_3_X_2022-03-03-1100, with a configuration obtained by: hltGetConfiguration /users/vdamante/GPUTest/HLT/V4 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root --no-output --process MYHLT > hlt_Valeria2.py
2) In the obtained file customizeHLTforPatatrackTriplets and a customisation function to get an ntuple with CNN outputs (and some other minor adjustments - adding of endjob_step and patAlgosToolsTask) have been applied. And I selected the event (this is one of the problematic ones) via process.source.eventsToProcess = cms.untracked.VEventRange(['323775:138:179372613'])
.
3) on lxplus-gpu.cern.ch, I ran 6 separated times and I saved patatrack observables with: cmsRun hlt_Valeria2.py
4) I printed out all patatrack and pata-vertices related observables that fulfill the following requirements:
5) The obtained files are attached (named tracks_14_i, vertices_14_i with i=16,18,24 and tracks_16_i, vertices_16_i with i=35,38,40 ) are attached here. If you compare them (I did with a very basic python script!) you can see that there are differences in:
For L2NNTag the most relevant change is the number of patatracks associated to vertices, which in the problematic event fluctuates in many cells (from 0 to 2, and hence also the total pT of patatracks associated to vertices from 0 to a value !=0 ). These differences might cause changes in outputs.
tracks_14_16.txt tracks_14_18.txt tracks_14_24.txt tracks_16_35.txt tracks_16_38.txt tracks_16_40.txt vertices_14_16.txt vertices_14_18.txt vertices_14_24.txt vertices_16_35.txt vertices_16_38.txt vertices_16_40.txt
In the .txt file above I did not find any discrepancy in the track associated to vertices. They look identical to me (to the last digit).
In the .txt file above I did not find any discrepancy in the track associated to vertices. They look identical to me (to the last digit).
The number of tracks is different. In tracks_14_24.txt, there are 1113 tracks, while in tracks_14_16.txt 1109 tracks
652 0.545105 0.0965592 -2.13828 1 0.041834 3 6 -1
580 0.535333 0.0992664 -2.1305 1 2.38 4 6 -1
67 0.61013 2.64948 -2.10212 1 4.45669 4 6 -1
1267 5.55364 -2.56354 -1.97965 -1 9.96772 4 3 -1
861 5.91727 -3.13949 -1.66239 1 32.1911 3 3 -1
these tracks appear only in tracks_14_24.txt
I take an example from 16_35 and 16_38 comparison: in 16_35 the number of this track is 419 and in 16_38 the index is 660 (I know the order is not important but I report here to allow you to find the information)
file 16_35 pt=1.27935, phi=2.59874, eta=-2.09418, charge=1, chi2=3.15117, nHits=4, quality=3, idv=-1
file 16_38 pt= 1.27935, phi= 2.59874, eta= -2.09418, charge=1, chi2=3.15117, nHits=4, quality=6, idv=-1
Sorry. I understood you were referring to discrepancies in tracks associated to vertices.
indeed this quite isolated quadruplets with quality either loose or HP is strange. Fro a cursory look I have not found any other occurence.
trk 1267 seems a marginal quadruplet (nothing obvious around): true that is only found in one out of 6 reco-job, (found just one more quad present in 4 out of 6. all others are identical)
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_16_40.txt:125
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_16_35.txt:15
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_14_24.txt:67
4 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6 tracks_14_18.txt:72
for trk 558 it seems that it's swapped for a close by triplet (same hits most probably). This is not really expected when running on the same hardware.
0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_16_40.txt:561
0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_14_18.txt:305
0.0992664 0.535333 -1 -2.1305 1 2.38 4 6 tracks_14_16.txt:580
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_16_38.txt:668
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_16_35.txt:683
0.0965592 0.545105 -1 -2.13828 1 0.041834 3 6 tracks_14_24.txt:652
the others are low quality triplets included mostly for seeding. It is known that the current algorithm cannot reproduce all of them.
file parsed with eihter
grep ',' track* | tr ',' ' ' | awk '{print $3, $2, $9, $4,$5,$6,$7,$8, $1}' | sort -g -r | less
or
grep ',' track* | tr ',' ' ' | awk '{print $7, $3, $2, $9, $4,$5,$6,$7,$8, $1}' | sort -g -r | less
the others are low quality triplets included mostly for seeding. It is known that the current algorithm cannot reproduce all of them.
hm... this could be the reason for L2TauTagNN irreproducibility: currently all tracks that pass Loose quality WP and have > 0 hits are considered as inputs: TrackGood.
@VinInn what (minimal) selection should be used for TrackGood
to ensure reproducibility of the inputs with the current track building algorithm?
hm... this could be the reason for L2TauTagNN irreproducibility: currently all tracks that pass Loose quality WP and have > 0 hits are considered as inputs: TrackGood. @VinInn what (minimal) selection should be used for TrackGood to ensure reproducibility of the inputs with the current track building algorithm? Full reproducibility is never guaranteed with current algorithm. I suggest to use a selection similar to PF in scouting as they made very detailed study
(same for track vertex association. currenly you are "counting" only the tracks used to identify and fit the vertices: high pt, high purity quadruplets)
v.
Just to quantify the GPU vs GPU fluctuation: out of 1110 tracks we get differences in 27 tracks (tracks_14_16.txt vs tracks_14_18.txt)
Specifically (using the grep
command above: phi, pt, idv, eta, charge, chi2, nHits, quality):
> 2.64948 0.61013 -1 -2.10212 1 4.45669 4 6
-2.53876 1.58212 -1 -1.71285 1 5.96387 4 6 | -2.53913 1.59199 -1 -1.72069 1 2.98019 4 6
2.59874 1.27935 -1 -2.09418 1 3.15117 4 6 | 2.59874 1.27935 -1 -2.09418 1 3.15117 4 3
@valeriadamante do you know the meaning of idv=9997 ?
@valeriadamante what is the difference between vtx_idx
and sortind
in the vertex.txt file?
Could you confirm that in the L2NN you don't use sortind
, vtx_idx
, idv
at all?
You may wish to "print" and "use" in the CNN nLayers
as well.
It may be that some of those 4-hit tracks are just triplets (3-layers) with two hits in the same layer.
Something else you may wish to consider are the significance of the pt and impact-param (or even the chord) as it may help to put less weight on tracks with large errors.
idv=9997 (9998-1) means that the track was used in the vertex finder but ended not "associated" to any vertex (search for 9998 in "RecoPixelVertexing/PixelVertexFinding")
@VinInn @fwyzard I noticed that the CPU vs GPU differences appear only after 300 - 500 events. Usually there are no difference in the first ~300 events. The "difference rate" reach a "plateau" after ~1000 events. Did you expect this?
@valeriadamante what is the difference between
vtx_idx
andsortind
in the vertex.txt file?
I checked, and the sortind
column should be ignored. Indeed, vtx_idx
is vertex_soa.sortInd[j]
(where j runs over vertex_soa
size) and sortInd is vertex_SOA.sortInd[vtx_idx]
. So please ignore this column in the comparison.
Could you confirm that in the L2NN you don't use
sortind
,vtx_idx
,idv
at all?
Yes, in L2NN I only use the number of vertices that pass a minimal selection (described here )
On 9 Mar, 2022, at 9:37 PM, Silvio Donato @.***> wrote:
@VinInn @fwyzard I noticed that the CPU vs GPU differences appear only after 300 - 500 events. Usually there are no difference in the first ~300 events. The "difference rate" reach a "plateau" after ~1000 events. Did you expect this?
No. is this in any workflow or some in particular?
No, I just run the L2NN in a different amount of events.
You can find all the files in /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study
:
CPU
and CPU_2
: done using CPU only1
and 2
: done using GPU (only for pixel local reco an tracks). I ran 2 times the same configurationskim1k
and skim1k_2
: done using GPU (only for pixel local reco an tracks)skim2k
and skim2k_2
: done using GPU (only for pixel local reco an tracks)Using hltDiff -n CPU_1/output.root -o skip_2k/output.root -v1 | grep HLT_
you can easily see that there are no difference in the first 300-500 events.
I made a more quantitative comparison: number of cumulated differences vs the number of processed events:
If you count the differences in the reversed order (ie. starting from the last processed event), you get this plot
It is clear that the "difference rate" is rather constant, but for some reason in there are no (or few) differences in the first hundreds of events
These are the exact numbers of the first events with differences:
Diff (number) | 1 | 2 | skim1k | skim1k_2 | skim2k | skim2k_2 |
---|---|---|---|---|---|---|
1 | 382 | 377 | 310 | 324 | 503 | 638 |
2 | 609 | 550 | 486 | 457 | 639 | 652 |
3 | 652 | 581 | 577 | 485 | 652 | 676 |
4 | 674 | 652 | 585 | 486 | 669 | 684 |
5 | 719 | 673 | 620 | 529 | 670 | 751 |
is not that the first few hundreds events are sort of low multiplicity?
is not that the first few hundreds events are sort of low multiplicity?
no, because I tested this on different events.
I mean the first 300 events of skim2k
correspond to the events between 2000 - 2300 of 1
,
and from the plot you can see that we see differences in skim2k
between 2000 - 2300
ok. I can only imagine that at start up things goes more in sync so all jobs run code more or less in the same order (even if in parallel). Then they loose sync, the occupancy of the GPU varies and code (blocks, waves) are run in a different order in different jobs.
We may try "cooperative groups" (once they are validated to be used in CMSSW) to see if they are more stable (as they allocate all gpu threads at once and allow global synchronization among them all).
The file /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root
is the output of a test where I ran on
/eos/cms/store/data/Run2018D/EphemeralHLTPhysics2/RAW/v1/000/323/775/00000/17ADD12B-52E2-8C4C-B375-8AF943A24212.root
(3151 events)
the following steps:
HLTGPU
) and selecting the events passing HLT_OnlyL2NN_v1
, ie. two tau with hltL2TauTagNNProducer
, DoubleTau
> 0.4327 (176 events)HLTCPU
) and inverting the tau selection (19 events)HLTGPU2
), no further selection (19 events)Checking the GPU2 value you can see that very often HLTGPU2
has a result very close to HLTCPU
rather then HLTGPU2
(as discussed above):
[sdonato@lxplus764 src]$ root -l /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root
root [0]
Attaching file /afs/cern.ch/work/s/sdonato/public/GPU_fluctuation_study/output_HLT_GPU_CPU_GPU2.root as _file0...
(TFile *) 0x47dedc0
root [1] Events->Scan("EventAuxiliary.event():floats_hltL2TauTagNNProducer_DoubleTau_HLTGPU.obj:floats_hltL2TauTagNNProducer_DoubleTau_HLTCPU.obj:floats_hltL2TauTagNNProducer_DoubleTau_HLTGPU2.obj")
***********************************************************************
* Row * Instance * EventAuxi * floats_hl * floats_hl * floats_hl *
***********************************************************************
* 0 * 0 * 179817566 * 0.8923431 * 0.7772132 * 0.7772129 *
* 0 * 1 * 179817566 * 0.5937884 * 0.2117204 * 0.2117119 *
* 1 * 0 * 179298167 * 0.5923708 * 0.0160080 * 0.0160080 *
* 1 * 1 * 179298167 * 0.8136105 * 0.4281417 * 0.4281420 *
* 2 * 0 * 179791629 * 0.9079257 * 0.2867714 * 0.5548577 *
* 2 * 1 * 179791629 * 0.6653458 * 0.6105449 * 0.6105442 *
* 3 * 0 * 179864959 * 0.8531022 * 0.0435173 * 0.0435174 *
I used basically keep *
so all the HLT objects (tracks, hits) are included, and you can even re-ran on the RAW event.
PS. The HLT command was
hltGetConfiguration /users/sdonato/GPUtest/Tau/HLT/V6 --globaltag auto:run3_hlt --data --eras Run2_2018 --max-events -1 --input file:aaa.root --output full --customise HLTrigger/Configuration/customizeHLTforCMSSW.customiseFor2018Input,HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrackTriplets
I made an easy script to compare some variables https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU.py This is one random event with CPU/GPU differences:
['phi', 'eta', 'dz', 'dxy', 'pt', 'chi2', 'charge', 'missingInnerHits']
i distance diff CPU GPU
249 484.69 [2.09, -0.88, 9.23, -0.04, 3.41, 23.06, -1, 0] [2.08, -0.88, 9.25, -0.0, 2.77, 1.06, -1, 0] [0.01, 0.0, -0.02, -0.04, 0.64, 22.01, 0, 0]
333 51.02 [2.7, -1.27, 0.17, 0.01, 2634.47, 10.18, 1, 0] [2.7, -1.27, 0.17, 0.01, 2627.33, 10.18, 1, 0] [0.0, -0.0, -0.0, 0.0, 7.14, -0.0, 0, 0]
478 0.45 [-3.09, -2.0, -1.11, 0.08, 1.42, 5.51, -1, 0] [-3.09, -2.0, -1.11, 0.08, 1.43, 4.84, -1, 0] [-0.0, 0.0, -0.0, 0.0, -0.01, 0.67, 0, 0]
606 0.04 [-2.23, 0.95, -5.31, 0.22, 278.08, 7.04, 1, 0] [-2.23, 0.95, -5.31, 0.22, 277.88, 7.04, 1, 0] [-0.0, -0.0, 0.0, 0.0, 0.2, 0.0, 0, 0]
1017 149.91 [0.51, 2.54, -1.01, -0.09, 1.57, 40.44, -1, 0] [0.51, 2.54, -0.99, -0.09, 1.57, 52.68, -1, 0] [0.0, 0.0, -0.03, -0.0, 0.0, -12.24, 0, 0]
1032 0.01 [0.87, 2.23, 8.58, -0.13, 114.02, 47.08, -1, 0] [0.87, 2.23, 8.58, -0.13, 113.91, 47.08, -1, 0] [0.0, 0.0, 0.0, -0.0, 0.12, -0.0, 0, 0]
1081 32056115.03 [2.73, 1.76, 6.68, 0.17, 1.59, 2.08, -1, 0] [-2000, -2000, -2000, -2000, -2000, -2000, -2000, -2000] [2002.73, 2001.76, 2006.68, 2000.17, 2001.59, 2002.08, 1999, 2000]
1180 8184.6 [0.91, -2.62, 1.91, -0.05, 2.08, 107.62, -1, 0] [0.9, -2.61, 1.88, -0.02, 1.44, 17.15, -1, 0] [0.01, -0.0, 0.03, -0.03, 0.64, 90.47, 0, 0]
1254 86.52 [2.81, -2.05, -1.18, 0.04, 0.81, 12.9, 1, 0] [2.81, -2.05, -1.18, 0.03, 0.8, 3.6, 1, 0] [-0.0, 0.0, -0.0, 0.0, 0.01, 9.3, 0, 0]
1262 32024193.35 [3.13, -1.62, -3.54, 0.17, 0.95, 7.95, -1, 0] [-2000, -2000, -2000, -2000, -2000, -2000, -2000, -2000] [2003.13, 1998.38, 1996.46, 2000.17, 2000.95, 2007.95, 1999, 2000]
1305 783.71 [-2.29, -1.76, -3.8, 0.24, 0.63, 0.3, -1, 0] [-2.29, -1.76, -3.8, 0.24, 0.63, 28.3, -1, 0] [-0.0, 0.0, -0.0, 0.0, -0.0, -27.99, 0, 0]
1402 48.09 [-0.22, -2.18, 1.89, -0.05, 2539.61, 18.99, 1, 0] [-0.22, -2.18, 1.89, -0.05, 2546.54, 18.99, 1, 0] [0.0, 0.0, -0.0, -0.0, -6.93, 0.0, 0, 0]
1471 38.81 [2.03, 1.3, -2.18, 0.17, 1.35, 27.83, 1, 0] [2.03, 1.31, -2.18, 0.18, 1.37, 21.6, 1, 0] [0.0, -0.0, -0.0, -0.0, -0.02, 6.23, 0, 0]
1525 1.91 [1.98, 2.12, -2.84, -0.16, 0.62, 2.37, -1, 0] [1.98, 2.12, -2.84, -0.16, 0.62, 3.75, -1, 0] [-0.0, 0.0, -0.0, 0.0, -0.0, -1.38, 0, 0]
1549 0.02 [0.46, 2.15, -2.56, -0.11, 0.84, 0.58, -1, 0] [0.45, 2.15, -2.57, -0.09, 0.81, 0.72, -1, 0] [0.0, -0.0, 0.01, -0.02, 0.02, -0.13, 0, 0]
1563 0.22 [1.95, -2.02, -2.84, -0.15, 0.63, 1.48, -1, 0] [1.95, -2.02, -2.83, -0.16, 0.63, 1.01, -1, 0] [-0.0, 0.0, -0.01, 0.01, -0.0, 0.47, 0, 0]
1945 1406.1 [-0.33, 2.34, -5.6, 0.06, 1.54, 48.7, 1, 0] [0.01, 1.89, -1.89, -0.01, 1.11, 11.39, 1, 0] [-0.35, 0.45, -3.71, 0.07, 0.43, 37.31, 0, 0]
1973 3.53 [-3.08, -1.75, -10.01, 0.03, 1.51, 12.12, -1, 0] [-3.09, -1.79, -8.41, 0.05, 1.46, 13.11, -1, 0] [0.0, 0.04, -1.6, -0.02, 0.05, -0.98, 0, 0]
Two considerations:
And this is the comparison of the pixel cluster: https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU_pixelHits.py
Considering the first event, there are a lot of clusters (~30) with a cluster charge difference equal to 1:
['x', 'y', 'charge', 'colSpan', 'size', 'sizeX', 'sizeY', 'minPixelCol', 'maxPixelCol', 'minPixelRow', 'maxPixelRow', 'overflow', 'overflowCol', 'overflowRow', 'colSpan', 'rowSpan']
detId: 303050780 cluster: 14 diff: 1.0
cpu: [63.56, 205.13, 99173, 9, 11, 2, 10, 199, 208, 63, 64, 0, 0, 0, 9, 1]
gpu: [63.56, 205.13, 99174, 9, 11, 2, 10, 199, 208, 63, 64, 0, 0, 0, 9, 1]
dif: [0.0, -0.0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
There are 3 clusters with a large difference in the cluster charge.
In the second case there is a +140%, while the coordinate x,y are exactly the same.
In all three cases there is a change in the size
, while sizeX
and sizeY
are unchanged (I guess this means that they are missing one pixel)
['x', 'y', 'charge', 'colSpan', 'size', 'sizeX', 'sizeY', 'minPixelCol', 'maxPixelCol', 'minPixelRow', 'maxPixelRow', 'overflow', 'overflowCol', 'overflowRow', 'colSpan', 'rowSpan']
detId: 303054852 cluster: 35 diff: 142277185.01
cpu: [54.42, 288.56, 98098, 10, 11, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
gpu: [54.43, 288.44, 110026, 10, 12, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
dif: [-0.01, 0.11, -11928, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303054852 cluster: 39 diff: 10093330.0
cpu: [56.5, 279.5, 2194, 0, 1, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
gpu: [56.5, 279.5, 5371, 0, 2, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -3177, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303067152 cluster: 36 diff: 22543505.0
cpu: [95.5, 367.5, 7109, 0, 1, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
gpu: [95.5, 367.5, 11857, 0, 2, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -4748, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Checking the single pixels associated to the clusters, I noticed that the problematic clusters have often two pixel with the same coordinates. Typically one pixel as the same value of the CPU cluster, while the other pixel is random. @tsusa
detId: 303054852 cluster: 35 diff: 142277185.01
cpu: [54.42, 288.56, 98098, 10, 11, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
gpu: [54.43, 288.44, 110026, 10, 12, 2, 11, 284, 294, 53, 54, 0, 0, 0, 10, 1]
dif: [-0.01, 0.11, -11928, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303054852 cluster: 39 diff: 10093330.0
cpu: [56.5, 279.5, 2194, 0, 1, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
gpu: [56.5, 279.5, 5371, 0, 2, 1, 1, 279, 279, 56, 56, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -3177, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
detId: 303067152 cluster: 36 diff: 22543505.0
cpu: [95.5, 367.5, 7109, 0, 1, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
gpu: [95.5, 367.5, 11857, 0, 2, 1, 1, 367, 367, 95, 95, 0, 0, 0, 0, 0]
dif: [0.0, 0.0, -4748, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
clusterDebug= {303054852: [35, 39], 303067152: [36]}
DetId: 303054852 clNum: 35
x y adc
diff: [0, 0, 0] cpu: [53, 294, 5829] gpu: [53, 294, 5829]
diff: [0, 0, 0] cpu: [53, 293, 1747] gpu: [53, 293, 1747]
diff: [0, 0, 0] cpu: [53, 292, 100] gpu: [53, 292, 100]
diff: [0, 0, 0] cpu: [54, 291, 1446] gpu: [54, 291, 1446]
diff: [0, 0, 0] cpu: [54, 290, 18456] gpu: [54, 290, 18456]
diff: [0, 0, 0] cpu: [54, 289, 4970] gpu: [54, 289, 4970]
diff: [0, 0, 0] cpu: [54, 288, 18046] gpu: [54, 288, 18046]
diff: [0, 0, 11669] cpu: [54, 287, 23597] gpu: [54, 287, 11928]
diff: [0, 0, 0] cpu: [54, 286, 10775] gpu: [54, 286, 10775]
diff: [0, 0, 0] cpu: [54, 285, 13032] gpu: [54, 285, 13032]
diff: [0, 0, 0] cpu: [54, 284, 100] gpu: [54, 284, 100]
diff: [-2054, -2287, -25597] cpu: [-2000, -2000, -2000] gpu: [54, 287, 23597]
DetId: 303054852 clNum: 39
x y adc
diff: [0, 0, 0] cpu: [56, 279, 2194] gpu: [56, 279, 2194]
diff: [-2056, -2279, -5177] cpu: [-2000, -2000, -2000] gpu: [56, 279, 3177]
DetId: 303067152 clNum: 36
x y adc
diff: [0, 0, 0] cpu: [95, 367, 7109] gpu: [95, 367, 7109]
diff: [-2095, -2367, -6748] cpu: [-2000, -2000, -2000] gpu: [95, 367, 4748]
This is the number of clusters containing duplicated pixels in each event:
event = 179817566 duplicates = 1
event = 179298167 duplicates = 3
event = 179791629 duplicates = 1
event = 179864959 duplicates = 1
event = 179874601 duplicates = 0
event = 179479449 duplicates = 2
event = 179064864 duplicates = 5
event = 179118965 duplicates = 1
event = 178787468 duplicates = 3
event = 180607223 duplicates = 3
event = 180699610 duplicates = 0
event = 181039937 duplicates = 1
event = 181330428 duplicates = 0
event = 181312720 duplicates = 1
event = 181451699 duplicates = 10
event = 180470519 duplicates = 2
event = 181735245 duplicates = 1
event = 181859892 duplicates = 4
event = 181517358 duplicates = 3
all these duplicates comes from the GPU. There are 0 duplicates in the CPU reconstruction. https://github.com/silviodonato/usercode/blob/master/compareGPUvsCPU_pixelHits_findDuplicates.py
I cannot say this is the cause of the GPU fluctuations, but I think this is a bug in the pixel local reco.
Checking the single pixels associated to the clusters, I noticed that the problematic clusters have often two pixel with the same coordinates. Typically one pixel as the same value of the CPU cluster, while the other pixel is random. @tsusa
This is a known issue for real data. Apparently time to time some pixels from the previous crossing are still around. So it may happen to find twice the same pixels in the raw data. On GPU they stay separate, on CPU their charge is summed. On monte carlo of course this never happen.
I assume we don't have any way of figuring out which ones are from the current event and which ones are from the previous one ?
In these case the duplicated pixels are not summed on CPU:
DetId: 303054852 clNum: 39 x y adc diff: [0, 0, 0] cpu: [56, 279, 2194] gpu: [56, 279, 2194] diff: [-2056, -2279, -5177] cpu: [-2000, -2000, -2000] gpu: [56, 279, 3177]
DetId: 303067152 clNum: 36 x y adc diff: [0, 0, 0] cpu: [95, 367, 7109] gpu: [95, 367, 7109] diff: [-2095, -2367, -6748] cpu: [-2000, -2000, -2000] gpu: [95, 367, 4748]
(-2000 means missing pixel)
In DetId: 303054852 clNum: 39
,
In DetId: 303067152 clNum: 36
,
If the CPU sums up the duplicated pixels and the GPU keep them separated, the total cluster charge should not change. On contrary, I do see a different cluster charge for
detId: 303054852 cluster: 39
(cpu: 2194 vs gpu: 5371)detId: 303067152 cluster: 36
(cpu: 7109 vs gpu: 11857)sorry I got confused by the code in the second copy_to_buffer
(where pixels are added)
the line to fill the buffer later used by make_clusters is
https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#294
and indeed it is set
not add
. So last pixel in raw-data wins.
Below some more numbers about duplicated pixels. I've found also events with three duplicated pixels
### Three pixels with the same (x,y) in (150, 208) 303087620 179064864 . The third pixel is:
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. charge3 = 13019
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. chargeCPU = 13019
and events with duplicated pixels in GPU with no corresponding pixel on CPU
detId = 303042568. x, y = 95, 291. charge1 = 23386. charge2 = 100. No corresponding CPU cluster found.
(probably because the CPU pick 100 and then the charge is to small to make a cluster. There is only one case where the smaller value of the two charge is 758 instead of 100). This seems also to explain the reason why we see a different number of clusters.
Event = 179817566
### Different number of pixel cluster in detId=303075348 32 vs 33
detId = 303075348. x, y = 143, 156. charge1 = 28972. charge2 = 100. No corresponding CPU cluster found.
Summary: event = 179817566 duplicates = 1
Event = 179298167
detId = 303054852. x, y = 54, 287. charge1 = 11928. charge2 = 23597. chargeCPU = 23597
detId = 303054852. x, y = 56, 279. charge1 = 2194. charge2 = 3177. chargeCPU = 2194
detId = 303067152. x, y = 95, 367. charge1 = 7109. charge2 = 4748. chargeCPU = 7109
Summary: event = 179298167 duplicates = 3
Event = 179791629
detId = 353130500. x, y = 115, 252. charge1 = 10537. charge2 = 132. chargeCPU = 132
Summary: event = 179791629 duplicates = 1
Event = 179864959
### Different number of pixel cluster in detId=303042568 25 vs 26
detId = 303042568. x, y = 152, 208. charge1 = 6319. charge2 = 100. No corresponding CPU cluster found.
Summary: event = 179864959 duplicates = 1
Event = 179874601
Summary: event = 179874601 duplicates = 0
Event = 179479449
detId = 304156696. x, y = 133, 304. charge1 = 11391. charge2 = 29045. chargeCPU = 11391
detId = 304156696. x, y = 134, 305. charge1 = 100. charge2 = 100. chargeCPU = 100
Summary: event = 179479449 duplicates = 2
Event = 179064864
detId = 303071256. x, y = 101, 40. charge1 = 19369. charge2 = 17041. chargeCPU = 17041
### Three pixels with the same (x,y) in (150, 208) 303087620 179064864 . The third pixel is:
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. charge3 = 13019
detId = 303087620. x, y = 150, 208. charge1 = 100. charge2 = 11354. chargeCPU = 13019
### Three pixels with the same (x,y) in (114, 272) 304156696 179064864 . The third pixel is:
detId = 304156696. x, y = 114, 272. charge1 = 10836. charge2 = 693. charge3 = 5517
detId = 304156696. x, y = 114, 272. charge1 = 10836. charge2 = 693. chargeCPU = 10836
Summary: event = 179064864 duplicates = 3
Event = 179118965
### Different number of pixel cluster in detId=353130500 4 vs 5
detId = 353130500. x, y = 114, 258. charge1 = 6724. charge2 = 758. No corresponding CPU cluster found.
Summary: event = 179118965 duplicates = 1
Event = 178787468
### Different number of pixel cluster in detId=303042568 65 vs 66
detId = 303042568. x, y = 88, 291. charge1 = 100. charge2 = 11937. chargeCPU = 100
detId = 303042568. x, y = 159, 208. charge1 = 100. charge2 = 8680. chargeCPU = 8680
detId = 303042568. x, y = 95, 291. charge1 = 23386. charge2 = 100. No corresponding CPU cluster found.
Summary: event = 178787468 duplicates = 3
Event = 180607223
detId = 303063072. x, y = 155, 77. charge1 = 8872. charge2 = 5421. chargeCPU = 8872
### Different number of pixel cluster in detId=303075348 53 vs 54
detId = 303075348. x, y = 159, 204. charge1 = 28289. charge2 = 100. No corresponding CPU cluster found.
detId = 303087628. x, y = 151, 276. charge1 = 11057. charge2 = 1501. chargeCPU = 1501
Summary: event = 180607223 duplicates = 3
Event = 180699610
Summary: event = 180699610 duplicates = 0
Event = 181039937
### Different number of pixel cluster in detId=353077252 5 vs 4
detId = 353130500. x, y = 127, 384. charge1 = 30295. charge2 = 11334. chargeCPU = 30295
Summary: event = 181039937 duplicates = 1
Event = 181330428
Summary: event = 181330428 duplicates = 0
Event = 181312720
detId = 303075360. x, y = 0, 27. charge1 = 31417. charge2 = 100. chargeCPU = 31417
Summary: event = 181312720 duplicates = 1
Event = 181451699
### Different number of pixel cluster in detId=303042568 68 vs 70
### Three pixels with the same (x,y) in (159, 260) 303042568 181451699 . The third pixel is:
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. charge3 = 100
### Three pixels with the same (x,y) in (159, 260) 303042568 181451699 . The third pixel is:
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. charge3 = 7472
detId = 303042568. x, y = 116, 262. charge1 = 17266. charge2 = 6187. chargeCPU = 6187
detId = 303042568. x, y = 159, 268. charge1 = 11393. charge2 = 100. chargeCPU = 11393
detId = 303042568. x, y = 159, 292. charge1 = 22019. charge2 = 100. No corresponding CPU cluster found.
detId = 303042568. x, y = 159, 260. charge1 = 100. charge2 = 100. No corresponding CPU cluster found.
### Three pixels with the same (x,y) in (159, 216) 303087620 181451699 . The third pixel is:
detId = 303087620. x, y = 159, 216. charge1 = 100. charge2 = 7297. charge3 = 21456
detId = 303087620. x, y = 158, 216. charge1 = 8340. charge2 = 9980. chargeCPU = 8340
detId = 303087620. x, y = 157, 216. charge1 = 100. charge2 = 100. chargeCPU = 100
detId = 303087620. x, y = 159, 216. charge1 = 100. charge2 = 7297. chargeCPU = 21456
Summary: event = 181451699 duplicates = 7
Event = 180470519
detId = 303042568. x, y = 103, 276. charge1 = 100. charge2 = 4378. chargeCPU = 4378
detId = 303087620. x, y = 156, 208. charge1 = 20890. charge2 = 11203. chargeCPU = 20890
Summary: event = 180470519 duplicates = 2
Event = 181735245
detId = 303067152. x, y = 135, 404. charge1 = 35575. charge2 = 5473. chargeCPU = 35575
Summary: event = 181735245 duplicates = 1
Event = 181859892
### Different number of pixel cluster in detId=304156696 16 vs 17
detId = 304156696. x, y = 84, 344. charge1 = 5096. charge2 = 100. No corresponding CPU cluster found.
detId = 344823812. x, y = 49, 85. charge1 = 5765. charge2 = 4385. chargeCPU = 5765
detId = 344823812. x, y = 68, 83. charge1 = 3022. charge2 = 4296. chargeCPU = 4296
### Different number of pixel cluster in detId=353130500 16 vs 17
detId = 353130500. x, y = 44, 269. charge1 = 100. charge2 = 7159. No corresponding CPU cluster found.
Summary: event = 181859892 duplicates = 4
Event = 181517358
detId = 303075360. x, y = 0, 35. charge1 = 18866. charge2 = 100. chargeCPU = 18866
detId = 303087628. x, y = 101, 276. charge1 = 9862. charge2 = 3492. chargeCPU = 3492
detId = 304156696. x, y = 133, 278. charge1 = 2531. charge2 = 29626. chargeCPU = 29626
Summary: event = 181517358 duplicates = 3
sorry I got confused by the code in the second
copy_to_buffer
(where pixels are added)the line to fill the buffer later used by make_clusters is https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#294 and indeed it is
set
notadd
. So last pixel in raw-data wins.
Thanks, is it possible to apply the same rule to both CPU and GPU?
On 13 Mar, 2022, at 1:53 PM, Silvio Donato @.***> wrote:
Thanks, is it possible to apply the same rule to both CPU and GPU? on GPU is difficult to select the last occurence. For the CPU I leave Pixel-DPG to comment. In any case I'm not sure if anybody know what is more correct.
In my opinion this is the less relevant of the difference: at the level of the anavoidable differences arising in FP operations
I think we need to decide if we accept different results or not from different architecture in general. Small mods in algo here and there are not a solution.
v.
I agree that probably this is not the main cause of the CPU/GPU differences that we observe, but anyway it is a clear different behavior between the two algorithms that need be uniformed either in the CPU or in the GPU code.
I think we need to decide if we accept different results or not from different architecture in general.
I understand that it is not feasible to have no differences, but we should really try to understand and to reduce that as much as possible. I think it is very difficult to accept differences above >5% (in HLT_DoubleMediumDeepTauIsoPFTauHPS35_L2NN_eta2p1_v1
they are above 20%).
Small mods in algo here and there are not a solution.
If I understand correctly, once we fix this issue of the repeated pixels, we can finally exclude that the differences comes from the pixel local reco.
Btw. the other (minor) difference is a difference of a 1 in the cluster charge.
I've seen that it comes directly from the pixel.adc
which sometimes differs of 1 between CPU and GPU.
Is it a known problem? (floating point error?)
You can try to change set
in add
in the line I quoted above (on CPU).
About the difference in adc values:
this is the CPU code for calibration https://cmssdt.cern.ch/dxr/CMSSW/source/CalibTracker/SiPixelESProducers/src/SiPixelGainCalibrationService.cc#31 and this one for GPU https://cmssdt.cern.ch/dxr/CMSSW/source/RecoLocalTracker/SiPixelClusterizer/plugins/gpuCalibPixel.h#66 there is clearly opportunity for fma (not available in standard CMSSW build for CPU)
One can force the same fma on both (but with standard cmssw that will expensive on CPU as it will not use the hardware one).
I am sure that DeepTau can be made more reproducible if one applies pixel-track selections closer to those used for instance in PF for scouting.
You can try to change
set
inadd
in the line I quoted above (on CPU).
Yes, it works. Now the total cluster charge matches within 10-20 adc, apart from a few events where we reconstruct a different number of clusters.
(and of course the "size" variable of the cluster is still different)
Tests done in multiple recent releases have shown that the HLT results are not consistent when running on GPU vs on CPU.
Here are the instruction to reproduce the issue using
CMSSW_12_1_0_pre3
/dev/CMSSW_12_1_0/GRun/V1
/RelValTTbar_14TeV/CMSSW_12_0_0_pre6-PU_120X_mcRun3_2021_realistic_v4_JIRA_129-v1/GEN-SIM-DIGI-RAW
/store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root
auto:phase1_2021_realistic
; running on a previous release using the same global tag as the sample itself (120X_mcRun3_2021_realistic_v4
) shows a similar behaviour.setup a CMSSW working area
extract the HLT configuration for running on GPU using the Run3 era
run the HLT menu on a GPU-equipped machine
compare the results
To disentangle the various effects, one can use different customisations on top of the HLT menu, running each resulting configuration with a GPU and without a GPU (that is, fully on the CPU). Replace the customisation at the bottom of the
hlt.py
filewith a more fine-grained one, described below.
legacy configuration
Run the HLT menu unchanged, adding only the
Status_OnGPU
andStatus_OnCPU
paths, without actually offloading any reconstruction to GPU:ECAL-only changes
To check the impact of running the ECAL reconstruction on GPU vs CPU, apply only the ECAL changes:
HCAL-only changes
To check the impact of running the HCAL reconstruction on GPU vs CPU, apply only the HCAL changes:
Pixel local reconstruction changes
To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel changes:
Pixel track reconstruction changes
To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel and Tracking changes. Clearly, for this comparison to be meaningful, the previous one needs to be understood first.
The ECAL-only comparison did not reveal significant differences.
The HCAL-only comparison showed significant differences in a few % of the events (order of 10% of the accepted events).
The Pixel local reconstruction comparison showed significant differences in a few % of the events (order of 10% of the accepted events), while affecting less paths than the HCAL one.
I think that looking at the Pixel track comparison makes sense only after fixing the local reconstruction one.
Updates
for running with recent IBs, please use https://github.com/cms-sw/cmssw/pull/35497 .
the 12.1.0-pre3 relvals can also be used, for example
/store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root
.