Open civanch opened 11 months ago
cms-bot internal usage
A new Issue was created by @civanch Vladimir Ivantchenko.
@Dr15Jones, @smuzaffar, @sextonkennedy, @antoniovilela, @makortel, @rappoccio can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign simulation
New categories assigned: simulation
@civanch,@mdhildreth you have been requested to review this Pull request/Issue and eventually sign? Thanks
@cms-sw/hlt-l2 FYI
I am currently seeing a similar problem running the HI workflow in latest CMSSW_14_1_X IB:
An exception of category 'Conditions not found' occurred while [0] Processing Event run: 1 lumi: 1 event: 33 stream: 0 [1] Running path 'HLTAnalyzerEndpath' [2] Prefetching for module L1TRawToDigi/'hltGtStage2Digis' [3] Prefetching for module RawDataCollectorByLabel/'rawDataCollector' [4] Prefetching for module SiStripDigiToRawModule/'SiStripDigiToRaw' [5] Calling method for module MixingModule/'mix' Exception Message: Unavailable Conditions of type HcalMCParams for cell (0x54000140) (Det 5:5 subdet 2:2 ZDC+ UNKNOWN 0,0)
To reproduce the problem I did:
cmsrel CMSSW_14_1_X_2024-03-29-1100 cd CMSSW_14_1_X_2024-03-29-1100/src/ cmsenv
cmsDriver.py Configuration/Generator/python/Starlight_DoubleDiffraction_5p36TeV_cfi.py -s LHE,GEN,SIM -n 40 --conditions auto:phase1_2023_realistic_hi --beamspot Realistic2022PbPbCollision --datatier GEN-SIM --eventcontent RAWSIM --era Run3_pp_on_PbPb_2023 --geometry DB:Extended --relval 9000,150 --fileout file:step1.root
cmsDriver.py step2 -s DIGI:pdigi_hi_nogen,L1,DIGI2RAW,HLT:@fake2 --conditions auto:phase1_2023_realistic_hi --datatier GEN-SIM-DIGI-RAW-HLTDEBUG --eventcontent FEVTDEBUGHLT --era Run3_pp_on_PbPb_2023 -n -1 --pileup HiMixNoPU --filein file:step1.root --fileout file:step2.root
@cms-sw/hcal-dpg-l2 FYI
This is to explicitly include Sunanda ( @bsunanda ). Since Nov.2023 there is a new HcalZDCDetId definition for Run3 Which is not yet used (due to some issues) neither for the Geometry initialization nor for DB conditions (to add new Run3 ZDC channels). Should be fixed soon, as was discussed elsewhere (email, HCAL DPG meeting).
Starting from CMSSW_14_1_X_2024-04-24-2300 the
----- Begin Fatal Exception 26-Apr-2024 10:58:14 CEST-----------------------
An exception of category 'Conditions not found' occurred while
[0] Processing Event run: 1 lumi: 1 event: 8 stream: 1
[1] Running path 'HLTAnalyzerEndpath'
[2] Prefetching for module L1TRawToDigi/'hltGtStage2Digis'
[3] Prefetching for module RawDataCollectorByLabel/'rawDataCollector'
[4] Prefetching for module SiStripDigiToRawModule/'SiStripDigiToRaw'
[5] Calling method for module MixingModule/'mix'
Exception Message:
Unavailable Conditions of type HcalMCParams for cell (0x54000140) (Det 5:5 subdet 2:2 ZDC+ UNKNOWN 0,0)
----- End Fatal Exception -------------------------------------------------
occurs frequently (but not always) in the step2 of workflows 180.1 and 181.1 (that were added/enabled in that IB).
Commenting here to flag that experts from HCAL and ZDC side are aware (as flagged above by @abdoulline, @bsunanda) and are still working towards a solution.
@civanch @abdoulline all. The issue is reappearing in the production of the premix samples for the 2024MC campaign, see gitlab I read in the initial post of this issue that "Temporary ZDC hits are masked until the problem will be solved": what does it mean? I.e., was it merged a PR that actually masked those "temporary ZDC hits"? And, if so: why such protection is apparently not being effective now?
In the recent update to ZDC geometry which is now done for 2024, ZDC digitization is open like all other detectors. However, this is true for CMSSW_14_1_X once we update the GT. If we have to do it for 14_0_X we need a lot of backporting which may not be easy.
From: Andrea Perrotta @.> Sent: 20 September 2024 12:16 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] [14_0_X SIM] ZDC problem in hlt_mc_HIon test (Issue #43582)
@civanchhttps://github.com/civanch @abdoullinehttps://github.com/abdoulline all. The issue is reappearing in the production of the premix samples for the 2024MC campaign, see gitlabhttps://gitlab.cern.ch/cms-ppd/event-performance/ep-coordination/-/issues/3#note_8460920 I read in the initial post of this issue that "Temporary ZDC hits are masked until the problem will be solved": what does it mean? I.e., was it merged a PR that actually masked those "temporary ZDC hits"? And, if so: why such protection is apparently not being effective now?
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/43582#issuecomment-2362951437, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZORWMM5BN7RNCPEX26TZXPACRAVCNFSM6AAAAABAXQQUGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSHE2TCNBTG4. You are receiving this because you were mentioned.Message ID: @.***>
I must admit don't have a clear understanding of what's happening... Initialization of HcalMCParams require all the specified (as valid) cells to be involved, including what's specified as ZDC valid ones, no matter if ZDC SimHits or Digis are present or not... This cell (0x54000140) is illegal...
I wonder if it may (or may not) be related to the absence in 14_0_X of a small fix of HcalZDCDetId https://github.com/cms-sw/cmssw/pull/45033 which was submitted (as new ZDC-only related) to 14_1_X...
I must admit don't have a clear understanding of what's happening... This cell (0x54000140) is illegal...
I wonder if it may (or may not) be related to the absence in 14_0_X of a small fix of HcalZDCDetId #45033 which was submitted (as new ZDC-only related) to 14_1_X...
Indeed, that fix was never backported in 14_0_X... We can try running with it. Your understanding is that if there was not such an illegal cell, the protection should have worked in 14_0_X: do I understand it correctly?
We can make a temporary fix in 14_0_X to let it proceed. Should we try to do that?
From: Andrea Perrotta @.> Sent: 20 September 2024 12:32 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] [14_0_X SIM] ZDC problem in hlt_mc_HIon test (Issue #43582)
I must admit don't have a clear understanding of what's happening... This cell (0x54000140) is illegal...
I wonder if it may (or may not) be related to the absence in 14_0_X of a small fix of HcalZDCDetId #45033https://github.com/cms-sw/cmssw/pull/45033 which was submitted (as new ZDC-only related) to 14_1_X...
Indeed, that fix was never backported in 14_0_X... We can try running with it. Your understanding is that if there was not such an illegal cell, the protection should have worked in 14_0_X: do I understand it correctly?
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/43582#issuecomment-2362976097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOTJD3HXQVGDPTTXUUTZXPB7XAVCNFSM6AAAAABAXQQUGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSHE3TMMBZG4. You are receiving this because you were mentioned.Message ID: @.***>
@perrotta
which kind of protection? 🤔
Let us do. It should not affect other detectors.
I've prepared the backport in https://github.com/cms-sw/cmssw/pull/46070
@perrotta which kind of protection? 🤔
I refer to what @civanch wrote in the issue description: "Temporary ZDC hits are masked until the problem will be solved."
@perrotta which kind of protection? 🤔
I refer to what @civanch wrote in the issue description: "Temporary ZDC hits are masked until the problem will be solved."
@civanch could you remind me how it was done?
@abdoulline , there is era dependent option CMStoZDCtransport = True/False.
I believe there is something that is percolating not right. The masking in HcalZDCDetId need not be changed. But i do not understand the issue since I did not follow it from the beginning.
From: Vladimir Ivantchenko @.> Sent: 20 September 2024 12:50 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] [14_0_X SIM] ZDC problem in hlt_mc_HIon test (Issue #43582)
@abdoullinehttps://github.com/abdoulline , there is era dependent option CMStoZDCtransport = True/False.
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/43582#issuecomment-2363005327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOTUOCK3G5H7ZPJER4DZXPEFLAVCNFSM6AAAAABAXQQUGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRTGAYDKMZSG4. You are receiving this because you were mentioned.Message ID: @.***>
@abdoulline , there is era dependent option CMStoZDCtransport = True/False.
OK, I see. But absence of ZDC SimHits does not prevent HcalMCParams to be initialized for all the (valid) ZDC (from Geometry initialization) cells... The problem why the cell in question is illegal...
If it is "false" no particle can go throuw the volume CMStoZDC - all are killed.
I believe there is something that is percolating not right. The masking in HcalZDCDetId need not be changed. But i do not understand the issue since I did not follow it from the beginning.
@bsunanda
May be that ZDCDetId (not fixed in 14_0_X) is not the culprit, as it's related to RPD channels, which should not exist in 14_0_X anyway.
I've just dumped and parsed and re-redumped txt input of existing DB conditions (HcalMCParams) with legacy ZDC (22 ch. EM+HAD+LUM) in 14_0_15 and there is no problem, no illegal numbers. Both old (in DB) and new (after txt re-dumping) ZDCDetIds are legal and there is no "extra" (Run3) RPD channels...
In the illegal ZDCDetId no ZDC section is defined... https://cmssdt.cern.ch/lxr/source/DataFormats/HcalDetId/interface/HcalZDCDetId.h#0033
Wild guess - may be some old (12_X/13_X) Digi are involved somehow? - No, at the first glance: https://cmssdt.cern.ch/lxr/source/Configuration/PyReleaseValidation/python/relval_standard.py#0878
@bsunanda I've reproduced the issue even with a non-standard setup: on lxplu9 in CMSSW_14_2_0_pre1 in 33th event of the step2 of wf 180.1 [1] (jobs run fast) :
Begin processing the 33rd record. Run 1, Event 33, LumiSection 1 on stream 0 at 20-Sep-2024 15:19:40.916 CEST ----- Begin Fatal Exception 20-Sep-2024 15:19:40 CEST----------------------- An exception of category 'Conditions not found' occurred while [0] Processing Event run: 1 lumi: 1 event: 33 stream: 0 [1] Running path 'FEVTDEBUGHLToutput_step' [2] Prefetching for module PoolOutputModule/'FEVTDEBUGHLToutput' [3] Prefetching for module CSCTriggerPrimitivesProducer/'simCscTriggerPrimitiveDigis' [4] Prefetching for module CSCDigiProducer/'simMuonCSCDigis' [5] Calling method for module MixingModule/'mix' Exception Message: Unavailable Conditions of type HcalMCParams for cell (0x54000140) (Det 5:5 subdet 2:2 ZDC+ UNKNOWN 0,0)
[1]
cmsDriver.py Configuration/Generator/python/Starlight_DoubleDiffraction_5p36TeV_cfi.py -s LHE,GEN,SIM -n 40 --conditions auto:phase1_2023_realistic_hi --beamspot Realistic2022PbPbCollision --datatier GEN-SIM --eventcontent RAWSIM --era Run3_pp_on_PbPb_2023 --geometry DB:Extended --relval 9000,150 --fileout file:step1.root
cmsDriver.py step2 -s DIGI:pdigi_hi_nogen,L1,DIGI2RAW --conditions auto:phase1_2023_realistic_hi --datatier GEN-SIM-DIGI-RAW-HLTDEBUG --eventcontent FEVTDEBUGHLT --era Run3_pp_on_PbPb_2023 -n -1 --pileup HiMixNoPU --filein file:step1.root --fileout file:step2.root
@bsunanda And with the ZDC exclusion from Digitization by commenting the line [1] (in addition to checks, which we've added recently to this module in your recent ZDC-related PRs) , the above step2 goes to the end.
Caveat: this "happy end" may be not real, as [1] may change the Digi rndm sequence (?). And [1] may just "swipes the dust under the carpet"...
@bsunanda we need your insight, I'm afraid, about it... https://github.com/cms-sw/cmssw/issues/43582#issuecomment-2363727822
this issue is hitting us again bit time for launching the Summer24-24 premix library, and we ought to find a solution to this. I understand that reproducibility is an issue, and we might have to just roll back anything related to ZDC to get out of this, if no solution can be found
I may be wrong but ZDC is not enabled for pp runs. Do i understand this correctly? Or this was only in past?
If my understanding is true, then for pp simulations in 2024 ZDC is not need and also ZDC is not needed for production of premix library. ZDC should be enabled only for HI simulation. When we enable ZDC for pp we get a significant factor slow down simulation (about 3-5 if my memory is correct). This happens for MinBias simulation because high energy hadrons hit ZDC and a full very energetic shower is simulated (without Russian roulette or other short-cut.
So, I am not sure if we should backport ZDC software to 14_0.
In 2023, the ZDC was enabled only in the case of the pp reference run occurring just before the heavy-ion run (which will also be the case in 2024). I am not sure if the reference run is handled differently in MC, but we have no need for ZDC pp simulations at 13.6 TeV as the ZDC was not included in these runs.
Hopefully this helps, especially if this is slowing down the simulation.
@bsunanda some additional recent info/observations :
(1) CMSSW_14_0_X_2024-10-05-1100 (the most recent 14_0_X IB) on lxplus8:
wf 180.1 step2 crashes in already known ev. 33 exactly the same way, as was earlier reported in 14_0_X, 14_1_X and 14_2_0_pre1, yet without the most recent ZDC Geometry-related updates, exposing illegal ZDCDetId:
Unavailable Conditions of type HcalMCParams for cell (0x54000140) (Det 5:5 subdet 2:2 ZDC+ UNKNOWN 0,0)
(2) CMSSW_14_0_X_2024-10-05-1100 + pending PR https://github.com/cms-sw/cmssw/pull/46246 (backport of what was recently merged into 14_1_X & 14_2_X):
wf 180.1 step2 goes smoothly though 1000 ev (increased from the default 40 ev)
Looks like an indication of improved ZDCDetId handling (?)
(3) CMSSW_14_0_X_2024-10-05-1100 + commented ZDCDigitizer: OK, as in case (2) above
just to be a bit clearer : 14.0.17 pilot for the premix library is failing with
cmsRun1 Fatal Exception (Exit Code: 8001)
An exception of category 'Conditions not found' occurred while
[0] Processing Event run: 1 lumi: 78 event: 77465 stream: 2
[1] Running path 'PREMIXoutput_step'
[2] Prefetching for module PoolOutputModule/'PREMIXoutput'
[3] Calling method for module MixingModule/'mix'
Exception Message:
Unavailable Conditions of type HcalMCParams for cell (0x54000140) (Det 5:5 subdet 2:2 ZDC+ UNKNOWN 0,0)
cmsDriver.py Configuration/GenProduction/python/PPD-RunIIISummer24PrePremix-00002-fragment.py --fileout file:PPD-RunIIISummer24PrePremix-00002.root --pileup_input "dbs:/MinBias_TuneCP5_13p6TeV-pythia8/RunIII2024Summer24GS-140X_mcRun3_2024_realistic_v20-v1/GEN-SIM" --mc --eventcontent PREMIX --pileup 2024_25ns_RunIII2024Summer24_PoissonOOTPU --datatier PREMIX --conditions 140X_mcRun3_2024_realistic_v21 --step GEN,SIM,DIGI --procModifiers premix_stage1 --nThreads 2 --geometry DB:Extended --era Run3_2024
with Configuration/GenProduction/python/PPD-RunIIISummer24PrePremix-00002-fragment.py
with /MinBias_TuneCP5_13p6TeV-pythia8/RunIII2024Summer24GS-140X_mcRun3_2024_realistic_v20-v1/GEN-SIM produced in 14.0.13 using
cmsDriver.py Configuration/GenProduction/python/PPD-RunIII2024Summer24GS-00002-fragment.py --fileout file:PPD-RunIII2024Summer24GS-00002.root --mc --eventcontent RAWSIM --datatier GEN-SIM --conditions 140X_mcRun3_2024_realistic_v20 --beamspot DBrealistic --step GEN,SIM --nThreads 4 --geometry DB:Extended --era Run3_2024
with Configuration/GenProduction/python/PPD-RunIII2024Summer24GS-00002-fragment.py
which is somehow the topic of the issue here, and hence I am saying that whichever ZDC code was aded in 14.0 is interfering and preventing 14.0 to be used for pp simulation.
whichever solution you, the experts, will come up with to get this solved is good for us, as long as 14.0 is usable for pp simulation, that we need to launch urgently.
I guess it's unrealistic to reproduce the issue in a private setup 🤔
[0] Processing Event run: 1 lumi: 78 event: 77465 stream: 2
I'm running it in 14_0_17, it takes ~3-4 s/ev on lxplus8...
@abdoulline would you like to pursue your suggestion https://github.com/cms-sw/cmssw/issues/43582#issuecomment-2363795961 to exclude ZDC from Digitization? Can a PR be made with it in 14_0_X? I would avoid adding 1867 lines of code in a closed release just to add some ZDC stuff which is not supposed to be used in pp productions with CMSW_14_0_X
@perrotta Should be able to submit the PR in question around noon...
@perrotta Can I try it out by today to see if the crash can be avoided in step2? I need today tp work on this
From: Salavat Abdullin @.> Sent: 07 October 2024 13:25 To: cms-sw/cmssw @.> Cc: Sunanda Banerjee @.>; Mention @.> Subject: Re: [cms-sw/cmssw] [14_0_X SIM] ZDC problem in hlt_mc_HIon test (Issue #43582)
@perrottahttps://github.com/perrotta Should be able to submit the PR in question around noon...
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cmssw/issues/43582#issuecomment-2396184662, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABGMZOVIMBU2NJS22C5UCCTZ2I46HAVCNFSM6AAAAABAXQQUGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJWGE4DINRWGI. You are receiving this because you were mentioned.Message ID: @.***>
what one can do, and will likely exhibit the same issue deterministically is to run the DIGI step of the MB, since anything that is failing during the DIGI step of the premix sample is due to the MB content.
...In the meantime (just in case) ZDC Digitizer removal submitted to 14_0_X https://github.com/cms-sw/cmssw/pull/46282
Just to mention this also here: https://github.com/cms-sw/cmssw/pull/46286 w/ backport to 14_0_X https://github.com/cms-sw/cmssw/pull/46282
Just to mention this also here: #46286 w/ backport to 14_0_X #46282
@hjbossi backports are not yet there. #46282 is the original proposal for 14_0_X by @abdoulline , which is now close because it will be superseded by the future backports of Sunanda's #46286
Hello, I have some very basic questions, sorry if they were already addressed and I missed it. I will list them below. I think it could also help us understand the overall picture of what is going on.
I'd rather have #46282 to be on the safe side for 14.0 usability in a short time.
Just to mention this also here: #46286 w/ backport to 14_0_X #46282
@hjbossi backports are not yet there. #46282 is the original proposal for 14_0_X by @abdoulline , which is now close because it will be superseded by the future backports of Sunanda's #46286
Ah sorry, you are correct. The correct link is now available here: https://github.com/cms-sw/cmssw/pull/46300
In the PR #43576 a production of ZDC hits is enabled, which make a problem for hlt_mc_HIon addOn test. Temporary ZDC hits are masked until the problem will be solved.