Closed pfs closed 6 months ago
Hello Pedro. I'd like to contribute. I installed cmssw according to the pinned instructions (orange box on p16). What do I do next? Running the command cmsRun ${CMSSW_BASE}/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py activeECONDs=0,1 gives me a fatal exception.
thanks! so can you get the command used by inspecting e.g.
/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Oct10/Run1695803516/5b2cb048-6d27-11ee-8957-fa163e8039dc/logs/cmds_Run1695803516_Link2_File0000000000.log
the structure is always the same Run*/*/logs/cmds_Run*log
the first command is
cmsRun -j FrameworkJobReport_Run1695803516_Link2_File0000000000_RECO.xml /afs/cern.ch/work/p/psilva/HGCal/TB2023/rereco/CMSSW_13_2_0_pre3/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py mode=slinkfromraw slinkBOE=0x55 cbHeaderMarker=0x7f econdHeaderMarker=0x154 applyFWworkaround=False inputFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link1_File0000000000.bin,/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link2_File0000000000.bin fedId=1,2 inputTrigFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link0_File0000000000.bin output=Run1695803516_Link2_File0000000000 conditions=default dqmOnly=False runNumber=1695803516 maxEvents=1000000000
in the many parameters what matters is conditions=default
so can you check in the run google table if this run is a DTH run or a standard run?
Another relevant piece of information. The conditions are mapped here DPGAnalysis/HGCalTools/python/tb2023_cfi.py
Thanks, @pfs ! Could you please point me to the google doc table? I can't find it (I only have this one, which lacks the runs I am interested in: https://docs.google.com/spreadsheets/d/13fuGnuPKuaHTAZSs9Fo84mZQ0OwQThkomTH8M2lKqMQ/edit#gid=1238056000 )
You have this google doc as well.
Also there is a file
/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Oct10/runregistry.csv
which in principle aggregates info from both google docs. If you can check that for those runs it's consistent it's very useful as the runregistry is built before launching the re-reco jobs.
Thank you, @pfs ! I noticed that you just list 2 runs in this ticket - for these 2 runs the issue I saw was that they have many fewer events in plots for a single channel than total number of entries. These are actually physics default and technical default runs respectively - is this expected or not? These files are large (~1M events).
A related but somewhat different issue brought up on mattermost, is that for the following 3 runs (probably more) have twice as many entries in histogram as they should: 14093956, 14102859, 17145351. Their conditions are pedestal MLFL00041ped, beam MLFL00041 e-, pedestal MLDSL57ped e- respectively. I did not find them in the google docs, but they are documented in the run registry on lxplus. These files are small (10k events)
I am summarising this in an attached table. I can also try running these files, though I don't understand how to read the output root file that is created. Filling files.pdf
Hi @olgovich
AH ok so for the size of these two runs: 1695803516 i see that it is zero suppressed and without absorber in front. So the size should be really small indeed 1695758608 - it's not clear from the google doc if ZS was applied or not (sigh) and should be a short run. Check the google doc
Can you do an histogram of counts per channel in these runs? It should be suppressed all over except for the beamspot channels.
Now for the other runs. They are standalone module runs and in the google doc these files are called differently typically as
run_202309{day}_{hour}{minute}{second}
which in CMSSW is turned to a single number as {day}{hour}{minute}{second}
I get the following
Run | Conditions | Google doc info |
---|---|---|
14093956 | MLFL00041ped | run_20230914_093956 the condition seems correct (MLF = Module Low-density Full) and pedestal |
14102859 | MLFL00041 | run_20230914_102859 likewise the condition seems OK |
17145351 | MLDSL57ped | run_20230917_145351 MLDSL57 condition seems correct (ML*L = Module Low-density Left) and pedestal |
The ped added on these runs is a hack for this test beam because the readout was different and was not properly propagated to CMSSW.
Now these events have twice the entries can you run the cmd.log file for one of them until the NANOAOD step is reached and check if the output NANOAOD file still has this problem?
Thanks, Pedro
Thanks @pfs for the explanations of nomenclature and zero suppression resulting in smaller number of entries in histograms. It really looks to me like both 1695803516 and 1695758608 are zero suppressed.
Run 1695758608 contains 2 NANOAOD files, with 999k and 593k events respectively - this structure and number of events does not match 51 subruns of 10k events, as mentioned in September 2023 HGCal Beam Test Single Module doc. Am I missing something?
I ran the first RECO step for the pedestal run 20230914_093956. Plotting a variable that should have just one entry per event from the output root file gives me twice the number of events in the histogram Events->Draw("HGCalFlaggedECONDInfos_hgcalDigis_UnpackerFlags_TEST.obj.eleid")
So it looks like the issue occurs in the RECO step.
I got a fatal exception when running: ----- Begin Fatal Exception 19-Dec-2023 15:27:10 CET----------------------- An exception of category 'HGCalModuleTreeReader' occurred while [0] Processing Event run: 14093956 lumi: 1 event: 10033 stream: 0 [1] Running path 'p' [2] Calling method for module HGCalSlinkEmulator/'hgcalEmulatedSlinkRawData' Exception Message: Insufficient number of events were retrieved from input tree to proceed with the generation of emulated events. ----- End Fatal Exception ------------------------------------------------- Do you think it is related to the issue?
I am going on holiday tomorrow. Will take a look at the slides from the HGCAL DPG meeting.
Thanks @olgovich
Run 1695758608 contains 2 NANOAOD files, with 999k and 593k events respectively - this structure and number of events does not match 51 subruns of 10k events, as mentioned in September 2023 HGCal Beam Test Single Module doc. Am I missing something?
In the google doc 1695758608 is said to be a short run for debugging. Indeed navigating back to where the raw data is i only see 2 files per link so that matches the final 2 NANOAOD files. I think this is fine. Where do you see 51 subruns of 10k events?
I ran the first RECO step for the pedestal run 20230914_093956. Plotting a variable that should have just one entry per event from the output root file gives me twice the number of events in the histogram Events->Draw("HGCalFlaggedECONDInfos_hgcalDigis_UnpackerFlags_TEST.obj.eleid")
OK so this weird. The conditions seem fine but we should have less entries.
So it looks like the issue occurs in the RECO step.
I got a fatal exception when running: ----- Begin Fatal Exception 19-Dec-2023 15:27:10 CET----------------------- An exception of category 'HGCalModuleTreeReader' occurred while [0] Processing Event run: 14093956 lumi: 1 event: 10033 stream: 0 [1] Running path 'p' [2] Calling method for module HGCalSlinkEmulator/'hgcalEmulatedSlinkRawData' Exception Message: Insufficient number of events were retrieved from input tree to proceed with the generation of emulated events. ----- End Fatal Exception ------------------------------------------------- Do you think it is related to the issue?
This one is "OK". For the moment we have to endure this crash while we don't change the producer. It's annoying but harmless.
I am going on holiday tomorrow. Will take a look at the slides from the HGCAL DPG meeting.
Ok enjoy holidays, i'm trying to figure out in more detail this run.
ok it looks like the list of FEDs being passed to the unpacker is [0,0] instead of [0] making it unpack twice the same thing. It's as dummy as
cmsRun -j FrameworkJobReport_pedestal_run0_roc2root_RECO.xml /afs/cern.ch/work/p/psilva/HGCal/TB2023/rereco/CMSSW_13_2_0_pre3/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py mode=hgcmodule fedId=0 slinkBOE=0x2a cbHeaderMarker=0x0 econdHeaderMarker=0x154 ECONDsInPassthrough=0 activeECONDs=0 ECONDsInCharacterisation=0 inputFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/SingleModuleTest/MLFL00041/pedestal_run/run_20230914_093956/pedestal_run0_roc2root.root fedId=0 output=pedestal_run0_roc2root conditions=MLFL00041ped dqmOnly=False runNumber=14093956 maxEvents=1000000000
i.e. fedId appears twice and the VarParsing tool makes a list of [0,0] instead of complaining of duplicated arguments. I'm making a simple patch for the moment.
As the duplicated entries are not solved in CMSSW but using the run registry tool I think we can close. I opened the issue here and submitted a re-reco of the single module data.
Files will appear here /eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Dec19
The analysis of these runs has twice the statistics. Investigate the commands used in
logs/cmds.log
of these runs. If it's an issue of the conditions used, re-reco with the appropriate conditions: requires installing locally the branch as described in the DPG Test Beam doc