Closed AdrianoDee closed 6 months ago
cms-bot internal usage
A new Issue was created by @AdrianoDee.
@Dr15Jones, @antoniovilela, @smuzaffar, @makortel, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign hlt
assign pdmv
New categories assigned: hlt,pdmv
@Martin-Grunewald,@mmusich,@AdrianoDee,@sunilUIET,@miquork you have been requested to review this Pull request/Issue and eventually sign? Thanks
@cms-sw/tau-pog-l2 FYI
type tau
just as an observation this path is not new (first included in the GRun menu in 2022, https://its.cern.ch/jira/browse/CMSHLT-2289)
EDIT but was touched recently in https://its.cern.ch/jira/browse/CMSHLT-3052
@cms-sw/pdmv-l2
In data reHLT+reRECO RelVals we are observing some failures at HLTDR_2023 step in path HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
Please help filling in some information:
I can't find it in the Dashboard. Since it is labelled HLTDR_2023, and the path in question is not in the Fake* menus, it must be in some 13_X release running the actual 2023 HLT with the 2023 version of that path.
Quick answers:
14_0_0_pre3
and 14_0_0
but I'm tracking it back to older releases (coming back as soon as I find the first occurrence);For the reproducibility and the CPU pattern I'll need a moment to check those.
Hmm well, in 14_X, HLTDR_2023 should (now) run the Fake* menus, while the real HLT menus should be within HLTDR_2024.
in 14_X, HLTDR_2023 should (now) run the Fake* menus, while the real HLT menus should be within HLTDR_2024
Indeed the configuration linked above has
L1REPACK:Full,HLT:@relval2024
, but in absence of real 2024 data we're running the 2024 menu on 2023 data.
I see the same (similar) error
Fatal Exception (Exit code: 8001)
An exception of category 'InvalidRun' occurred while
[0] Processing Event run: 367131 lumi: 122 event: 206577729 stream: 1
[1] Running path 'HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_v6'
[2] Calling method for module DeepTauId/'hltHpsPFTauDeepTauProducer'
Exception Message:
error while running session: INVALID_ARGUMENT: Incompatible shapes: [0,1,1,38] vs. [92]
[[{{node inner_hadrons_norm_1/FusedBatchNorm_1/Mul}}]]
in 13_3_0_pre5
RunDisplacedJet2023C running L1REPACK:Full,HLT:@relval2023
.
HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_v6
This is a different path, so it points to a general problem with DeepTauId
(path-aspecific)
For context, it appears the exception comes from here:
assign ml
assign reconstruction
New categories assigned: ml,reconstruction
@jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks
There is also an earlier episode https://github.com/cms-sw/cmssw/issues/42862
There is also an earlier episode https://github.com/cms-sw/cmssw/issues/42862
That was affecting only phase2 workflows and got fixed by https://github.com/cms-sw/cmssw/pull/43855
Following the links there pointed to https://cms-unified.web.cern.ch/cms-unified/joblogs/pdmvserv_RVCMSSW_13_3_0_pre5RunDisplacedJet2023C__Data_2023_RelVal_2023C_231107_154737_5273/8001/HLTDR3_2023/04130c52-d023-4ad4-8e5d-5dbecdb27cab-106-0-logArchive/ according to which the job was ran on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
that is Cascade Lake. It has the same AVX512F AVX512_VNNI
features that in https://github.com/cms-sw/cmssw/issues/42862 somehow seemed play a role.
Few additional things I found out after investigating:
13_3_0_pre5
~ 13_3_0_pre5
pdmvserv_RVCMSSW_13_3_0_pre5TenTau_15_500_231127_105150_2624 withAn exception of category 'InvalidRun' occurred while
[0] Processing Event run: 1 lumi: 26 event: 2570 stream: 1
[1] Running path 'HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_OneProng_M5to80_v4'
[2] Calling method for module DeepTauId/'hltHpsPFTauDeepTauProducer'
Exception Message:
error while running session: INVALID_ARGUMENT: Incompatible shapes: [0,1,1,64] vs. [154]
[[{{node inner_muon_norm_1/FusedBatchNorm_1/Mul}}]]
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
(or on a Gold
one), still Cascade Lake.13_3_0_pre5
but I haven't done much investigative effort there. I don't see any updates to the TF backend there that would justify this.~13_3_0_pre1
(the unified logs).Just to add another piece of information, I see many similar errors in my private HLT rerun with CMSSW_14_0_0. However, the error occurs in hltL2TauTagNNProducer
, which runs another TF-based tau tagger whose code has not changed for one year (even more, if we ignore minor commits that do not affect functionality).
For example:
== CMSSW: 2024-03-09 09:00:02.226048: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: scale must have the same number of elements as the channels of x, got 80 and 31
== CMSSW: [[{{node cnn_model/StatefulPartitionedCall/StatefulPartitionedCall/batch_normalization_CNN1x1_0/FusedBatchNormV3}}]]
== CMSSW: ----- Begin Fatal Exception 09-Mar-2024 09:00:04 CET-----------------------
== CMSSW: An exception of category 'InvalidRun' occurred while
== CMSSW: [0] Processing Event run: 369870 lumi: 219 event: 67715906 stream: 0
== CMSSW: [1] Running path 'nanoAOD_step'
== CMSSW: [2] Calling method for module L2TauNNProducer/'hltL2TauTagNNProducer'
== CMSSW: Exception Message:
== CMSSW: error while running session: INVALID_ARGUMENT: scale must have the same number of elements as the channels of x, got 80 and 31
== CMSSW: [[{{node cnn_model/StatefulPartitionedCall/StatefulPartitionedCall/batch_normalization_CNN1x1_0/FusedBatchNormV3}}]]
I checked the CPU architectures for a few crashed jobs: Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz
and Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
.
It looks like a more general TF-related issue that affects not only DeepTau.
Makes me wonder if the root cause could be the same as in https://github.com/cms-sw/cmssw/issues/42444 ...
Would anyone have a simple recipe to reproduce any of these?
Makes me wonder if the root cause could be the same as in #42444 ...
Would anyone have a simple recipe to reproduce any of these?
@makortel something like this should reproduce the error in principle
cmsDriver.py step2 --conditions auto:run3_hlt_relval --data --datatier FEVTDEBUGHLT --era Run3_2023 --eventcontent FEVTDEBUGHLT --filein /store/data/Run2023C/DisplacedJet/RAW/v1/000/367/131/00000/9f3f571f-6dc9-4bda-a68b-5d1b9a5fc3ac.root --fileout file:step2.root --nStreams 4 --nThreads 8 --number 10 --process reHLT --python_filename step_2_cfg.py --step L1REPACK:Full,HLT:@relval2024 --customise_commands "process.source.skipEvents = cms.untracked.uint32(1800)"
since it would end up running the same reHLT process on top of the same Event (195390586) of the same Run (367131) for which the failure appears here. But I'm not being able to reproduce it actually.
since it would end up running the same reHLT process on top of the same Event (195390586) of the same Run (367131) for which the failure appears
since the process is run multi-threaded are you sure that the last event that leaves a message logger record is also the one crashing the process?
@makortel @AdrianoDee I can reproduce in the following way:
1) go on lxplus901
(in order to have a machine with Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
)
2) prepare the input file via:
edmCopyPickMerge outputFile=pickevents.root eventsToProcess=367131:196942831 inputFiles=/store/data/Run2023C/DisplacedJet/RAW/v1/000/367/131/00000/9f3f571f-6dc9-4bda-a68b-5d1b9a5fc3ac.root
3) run:
cmsDriver.py step2 --conditions auto:run3_hlt_relval --data --datatier FEVTDEBUGHLT --era Run3_2023 --eventcontent FEVTDEBUGHLT --filein file:pickevents.root --fileout file:step2.root --nStreams 4 --nThreads 8 --number -1 --process reHLT --python_filename step_2_cfg.py --step L1REPACK:Full,HLT:@relval2024 --accelerators cpu
In CMSSW_14_0_0
it crashes with [1]
Notice that in a recent IB (CMSSW_14_1_X_2024-03-10-2300
) the issue seems to have disappeared.
[1]
L1REPACK:Full,HLT:@relval2024,ENDJOB
entry file:pickevents.root
Step: L1REPACK Spec: ['Full']
# L1T INFO: L1REPACK:Full will unpack all L1T inputs, re-emulated (Stage-2), and pack uGT, uGMT, and Calo Stage-2 output.
Step: HLT Spec: ['@relval2024']
Step: ENDJOB Spec:
Starting cmsRun step_2_cfg.py
# L1T INFO: L1REPACK:Full will unpack all L1T inputs, re-emulated (Stage-2), and pack uGT, uGMT, and Calo Stage-2 output.
%MSG-i ThreadStreamSetup: (NoModuleName) 12-Mar-2024 00:23:22 CET pre-events
setting # threads 8
setting # streams 4
%MSG
%MSG-i AlpakaService: (NoModuleName) 12-Mar-2024 00:23:23 CET pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
- Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
%MSG
...
Begin processing the 1st record. Run 367131, Event 196942831, LumiSection 117 on stream 3 at 12-Mar-2024 00:24:02.428 CET
#--------------------------------------------------------------------------
# FastJet release 3.4.1
# M. Cacciari, G.P. Salam and G. Soyez
# A software package for jet finding and analysis at colliders
# http://fastjet.fr
#
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].
#
# FastJet is provided without warranty under the GNU GPL v2 or higher.
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
2024-03-12 00:24:06.899321: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: Incompatible shapes: [0,1,1,86] vs. [207]
[[{{node inner_egamma_norm_1/FusedBatchNorm_1/Mul}}]]
----- Begin Fatal Exception 12-Mar-2024 00:24:06 CET-----------------------
An exception of category 'InvalidRun' occurred while
[0] Processing Event run: 367131 lumi: 117 event: 196942831 stream: 3
[1] Running path 'HLT_DoublePFJets40_Mass500_MediumDeepTauPFTauHPS45_L2NN_MediumDeepTauPFTauHPS20_eta2p1_v6'
[2] Calling method for module DeepTauId/'hltHpsPFTauDeepTauProducerForVBFIsoTau'
Exception Message:
error while running session: INVALID_ARGUMENT: Incompatible shapes: [0,1,1,86] vs. [207]
[[{{node inner_egamma_norm_1/FusedBatchNorm_1/Mul}}]]
----- End Fatal Exception -------------------------------------------------
since the process is run multi-threaded are you sure that the last event that leaves a message logger record is also the one crashing the process?
Thanks Marco, indeed I was forgetting this.
For the records (and my mental health) I wasn't anyway able to reproduce it on an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
with my original setup (that was anyway by chance hitting the event 196942831
) in 14_0_0
%MSG-i ThreadStreamSetup: (NoModuleName) 11-Mar-2024 23:49:31 CET pre-events
setting # threads 8
setting # streams 4
%MSG
11-Mar-2024 23:50:13 CET Initiating request to open file root://eoscms.cern.ch//eos/cms/store/data/Run2023C/DisplacedJet/RAW/v1/000/367/131/00000/9f3f571f-6dc9-4bda-a68b-5d1b9a5fc3ac.root
11-Mar-2024 23:50:17 CET Successfully opened file root://eoscms.cern.ch//eos/cms/store/data/Run2023C/DisplacedJet/RAW/v1/000/367/131/00000/9f3f571f-6dc9-4bda-a68b-5d1b9a5fc3ac.root
%MSG-w NonConsumedConditionalModules: AfterModConstruction 11-Mar-2024 23:50:37 CET pre-events
The following modules were part of some ConditionalTask, but were not
consumed by any other module in any of the Paths to which the ConditionalTask
was associated. Perhaps they should be either removed from the
job, or moved to a Task to make it explicit they are unscheduled.
hltPixelTracksTrackingRegions
hltSiPixelClustersCache
hltSiPixelClustersCacheCPUOnly
hltSiPixelClustersFromSoA
hltSiPixelDigisSoA
hltSiPixelRecHitsFromGPU
hltSiPixelRecHitsSoA
statusOnGPU@cuda
%MSG
[...]
Begin processing the 1st record. Run 367131, Event 195019958, LumiSection 117 on stream 3 at 11-Mar-2024 23:50:48.813 CET
Begin processing the 2nd record. Run 367131, Event 196362425, LumiSection 117 on stream 0 at 11-Mar-2024 23:50:48.814 CET
Begin processing the 3rd record. Run 367131, Event 196360607, LumiSection 117 on stream 2 at 11-Mar-2024 23:50:48.816 CET
Begin processing the 4th record. Run 367131, Event 196460914, LumiSection 117 on stream 1 at 11-Mar-2024 23:50:49.206 CET
#--------------------------------------------------------------------------
# FastJet release 3.4.1
# M. Cacciari, G.P. Salam and G. Soyez
# A software package for jet finding and analysis at colliders
# http://fastjet.fr
#
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].
#
# FastJet is provided without warranty under the GNU GPL v2 or higher.
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
Begin processing the 5th record. Run 367131, Event 194945538, LumiSection 117 on stream 3 at 11-Mar-2024 23:50:51.795 CET
Begin processing the 6th record. Run 367131, Event 194945544, LumiSection 117 on stream 2 at 11-Mar-2024 23:50:52.004 CET
Begin processing the 7th record. Run 367131, Event 195266551, LumiSection 117 on stream 1 at 11-Mar-2024 23:50:52.028 CET
Begin processing the 8th record. Run 367131, Event 196331770, LumiSection 117 on stream 0 at 11-Mar-2024 23:50:52.104 CET
Begin processing the 9th record. Run 367131, Event 196942831, LumiSection 117 on stream 2 at 11-Mar-2024 23:50:52.774 CET
Begin processing the 10th record. Run 367131, Event 196939181, LumiSection 117 on stream 3 at 11-Mar-2024 23:50:52.866 CET
11-Mar-2024 23:50:53 CET Closed file root://eoscms.cern.ch//eos/cms/store/data/Run2023C/DisplacedJet/RAW/v1/000/367/131/00000/9f3f571f-6dc9-4bda-a68b-5d1b9a5fc3ac.root
I can reproduce in the following way: ... In
CMSSW_14_0_0
it crashes with [1]
Thanks, I was able to reproduce.
Notice that in a recent IB (
CMSSW_14_1_X_2024-03-10-2300
) the issue seems to have disappeared.
The reproduced succeeds also in 14_1_0_pre1.
In 14_0_0, the exception is thrown via
(gdb) where
#0 0x00007ffff5ead0f1 in __cxxabiv1::__cxa_throw (obj=0x7fff0b218c00, tinfo=0x7ffff79a3668 <typeinfo for cms::Exception>, dest=0x7ffff796ce20 <cms::Exception::~Exception()>)
at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:81
#1 0x00007fffbd01b989 in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tsl::thread::ThreadPoolOptions const&) [clone .cold] ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/libPhysicsToolsTensorFlow.so
#2 0x00007fffbd020589 in tensorflow::run(tensorflow::Session*, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tensorflow::Tensor> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >*, tsl::thread::ThreadPoolInterface*) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/libPhysicsToolsTensorFlow.so
#3 0x00007fff710a4924 in DeepTauId::getPartialPredictions(bool) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/pluginRecoTauTagRecoTauPlugins.so
#4 0x00007fff710b0b68 in void DeepTauId::createConvFeatures<reco::PFCandidate, reco::PFTau>(reco::PFTau const&, unsigned long, edm::RefToBase<reco::BaseTau>, reco::Vertex const&, double, std::vector<pat::Electron, std::allocator<pat::Electron> > const*, std::vector<pat::Muon, std::allocator<pat::Muon> > const*, edm::View<reco::Candidate> const&, (anonymous namespace)::CellGrid const&, (anonymous namespace)::TauFunc, bool) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/pluginRecoTauTagRecoTauPlugins.so
#5 0x00007fff710b3643 in void DeepTauId::getPredictionsV2<reco::PFCandidate, reco::PFTau>(reco::BaseTau const&, unsigned long, edm::RefToBase<reco::BaseTau>, std::vector<pat::Electron, std::allocator<pat::Electron> > const*, std::vector<pat::Muon, std::allocator<pat::Muon> > const*, edm::View<reco::Candidate> const&, reco::Vertex const&, double, unsigned long long const&, std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >&, (anonymous namespace)::TauFunc) [clone .lto_priv.0] ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/pluginRecoTauTagRecoTauPlugins.so
#6 0x00007fff710aa903 in DeepTauId::produce(edm::Event&, edm::EventSetup const&) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/pluginRecoTauTagRecoTauPlugins.so
#7 0x00007ffff7e483c1 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/libFWCoreFramework.so
#8 0x00007ffff7e2c04e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) ()
from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_14_0_0/lib/el9_amd64_gcc12/libFWCoreFramework.so
The reproduced succeeds also in 14_1_0_pre1.
I think this happens simply because that particular trigger path (HLT_DoublePFJets40_Mass500_MediumDeepTauPFTauHPS45_L2NN_MediumDeepTauPFTauHPS20_eta2p1_v
) got removed in the meanwhile in https://github.com/cms-sw/cmssw/pull/44073 (14_1_X) and https://github.com/cms-sw/cmssw/pull/44074 (14_0_X). I think the reproducer would succeed in CMSSW_14_0_1 as well (but I didn't test it).
In 14_0_0
cmsRun
, cmsRunTC
, and cmsRunGlibC
valgrind
with cmsRun
does not result in an exception (or any related warnings)I haven't seen any comments from @cms-sw/ml-l2 , are they aware of the issue ?
Hi all! I investigated the reproducer and I think I found the issue.
The number of valid_grid_cells
here is 0 for this event and this is creating a TF::Tensor
with shape [0, 1, 1, N].
In TensorFlow this is a valid tensor which has a specific shape but it is empty.
>>> import tensorflow as tf
>>> tensor = tf.zeros([0, 1, 1, 86])
>>> tensor
<tf.Tensor: shape=(0, 1, 1, 86), dtype=float32, numpy=array([], shape=(0, 1, 1, 86), dtype=float32)>
>>> tf.print(tensor)
[]
Apparently, when this input is passed to a TF model executed on a CPU without AVX512F AVX512_VNNI
, the model is executed and returns an empty output without complaining. When AVX512F AVX512_VNNI
instructions are present, the jitting is different and the TF executor complains. Now, I'm not saying that it is understood why this happens, but this is the reason of the crash.
I can prepare a PR with guards to avoid the execution of the model with empty inputs, and in parallel investigate more deeply this TF behaviour.
This failure was now seen in Tier0 PromptReco https://cms-talk.web.cern.ch/t/update-t0-skim-config-for-2024-pp-collision/36794/5 .
urgent
This failure was now seen in Tier0 PromptReco https://cms-talk.web.cern.ch/t/update-t0-skim-config-for-2024-pp-collision/36794/5
I can prepare a PR with guards to avoid the execution of the model with empty inputs, and in parallel investigate more deeply this TF behaviour.
@valsdav, we have established that this issue can affect Prompt Reconstruction and (potentially, when the new nodes for the HLT farm arrive) also online trigger operations. Please prepare PRs with guards to avoid the execution of the model with empty inputs. Thank you.
Marco (as ORM)
for record, the proposed fixes are:
+1 solved by https://github.com/cms-sw/cmssw/pull/44455
+ml
Basic guards to solve the empty input problem in DeepTauId are in place, but the reason of the empty grid needs to be investigated with Tau experts.
A more general guard for empty inputs will be added (see https://github.com/cms-sw/cmssw/issues/44481)
+pdmv (really only the reporter)
... hlt will sign once the 14.0.X PR is merged and tested in IBs.
but the reason of the empty grid needs to be investigated with Tau experts.
@cms-sw/reconstruction-l2 this looks like needs a separate issue. Can you open one?
+hlt
This issue is fully signed and ready to be closed.
@cmsbuild, please close
Running RelVals we are observing some failures due to a tensorflow exception coming from
DeepTauId
module. Some examples listed here.1) 2023 Data reHLT + reRECO
In
HLTDR3_2023
step in pathHLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
in14_0_0_pre3
RelValswith the config here, that is what we get from wf
141.035
runningL1REPACK:Full,HLT:@relval2024
(HLT pointing at GRun here). The error here. The wf on Stats2.Also in the same step in
13_3_0_pre5
RunDisplacedJet2023C in a different path (HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_v6
) run inHLT:@relval2023
. The error here. The wf on Stats2.2) 2022 Data reHLT + reRECO
Much rarer in
AODNANORUN3_reHLT_2022
step indeepTau2017v2p1ForMini
inRunJetMET2022D
with14_0_0
The error here. The wf on Stats2.3) MC 2023
In
DigiPU_2023PU
step inhltHpsPFTauDeepTauProducer
inRelValTenTau_15_500
with13_3_0_pre1
(at the moment the first occurrence I found). The error here. The wf on Stats2.CPU
At the moment it appears that in all cases the jobs were running on
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
(or on aGold
one), Cascade Lake (see https://github.com/cms-sw/cmssw/issues/44333#issuecomment-1983672263).