Open francescobrivio opened 1 year ago
assign alca, ctpps-dpg
New categories assigned: ctpps-dpg,alca
@vavati,@fabferro,@jan-kaspar,@francescobrivio,@saumyaphor4252,@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @francescobrivio .
@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
I forgot to add the recipe to reproduce the error:
cmsrel CMSSW_13_0_3
cd CMSSW_13_0_3/src/
cmsenv
cp -r /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2023A/job_250879/job/WMTaskSpace/ .
cd WMTaskSpace/cmsRun1/
cmsRun PSet.py
Instead, in order to use the "rolled-back" conditions (and cure the crash), you can simply edit PSet.py
to be:
import FWCore.ParameterSet.Config as cms
import pickle
with open('PSet.pkl', 'rb') as handle:
process = pickle.load(handle)
process.GlobalTag.globaltag = '130X_dataRun3_Express_RecoverPPS_v1'
Updating this thread. The reason for the crash was understood as due to a missing (zero sized) data field (matchingReferencePoints
) in the payload introduced in IOV=365978 of PPSAlignmentConfig_reference_Run3_v1_express consumed by the PCL. It has been properly fixed and a new payload will be submitted.
. It has been properly fixed and a new payload will be submitted.
is this related to https://cms-talk.web.cern.ch/t/ppd-alcadb-gt-online-hlt-express-prompt-updated-pps-alignment-conditions-for-pcl/24053/1 ?
If yes I would suggest to validate the new payload in an express replay as well @cms-sw/alca-l2
. It has been properly fixed and a new payload will be submitted.
is this related to https://cms-talk.web.cern.ch/t/ppd-alcadb-gt-online-hlt-express-prompt-updated-pps-alignment-conditions-for-pcl/24053/1 ?
Yes.
Hi @mmusich we were planning to test it with the dedicated relvals that test the PPS pcl (we are also setting this up in the AlcaVal tool), but indeed using an Express replay sounds like a good idea: I'll prepare it tomorrow.
but indeed using an Express replay sounds like a good idea: I'll prepare it tomorrow.
Thanks @francescobrivio.
I was wondering if the PPS experts can also comment on this:
the code is packed with std::map element evaluations with bound checks (which leads to exceptions at runtime)"
crashes at Tier-0 have a human and computing cost. It would be better to avoid having the code crash on a bad configuration. @wpcarvalho
This issue is to keep track of possible future fixes (code-wise) of the issue reported in this CMSTalk post. The error appeared while processing the
ALCAPPSExpress
stream for run 366035, and the exception reported is:The actual issue was traced back to an update of the
PPSAlignmentConfig
conditions happened few days ago (CMSTalk announcement), and the conditions have now been rolled-back until the problem is understood by PPS experts.@mmusich kindly pointed out that the issue is most probably originating in this line: https://github.com/cms-sw/cmssw/blob/8c5f3c7d2257166af259dc5517c462dce5ce199c/CalibPPS/AlignmentGlobal/plugins/PPSAlignmentHarvester.cc#L614 when
rpc.id_
is equal to23
(I let PPS experts comment further on the exact meaning of this).In addition, Marco pointed out that:
which should probably be fixed as well.