Investigate conditions in reconstructing 1695803516 and 1695758608

pfs commented 6 months ago

The analysis of these runs has twice the statistics. Investigate the commands used in logs/cmds.log of these runs. If it's an issue of the conditions used, re-reco with the appropriate conditions: requires installing locally the branch as described in the DPG Test Beam doc

olgovich commented 6 months ago

Hello Pedro. I'd like to contribute. I installed cmssw according to the pinned instructions (orange box on p16). What do I do next? Running the command cmsRun ${CMSSW_BASE}/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py activeECONDs=0,1 gives me a fatal exception.

pfs commented 6 months ago

thanks! so can you get the command used by inspecting e.g.

/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Oct10/Run1695803516/5b2cb048-6d27-11ee-8957-fa163e8039dc/logs/cmds_Run1695803516_Link2_File0000000000.log

the structure is always the same Run*/*/logs/cmds_Run*log

the first command is

cmsRun -j FrameworkJobReport_Run1695803516_Link2_File0000000000_RECO.xml /afs/cern.ch/work/p/psilva/HGCal/TB2023/rereco/CMSSW_13_2_0_pre3/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py mode=slinkfromraw slinkBOE=0x55 cbHeaderMarker=0x7f econdHeaderMarker=0x154 applyFWworkaround=False inputFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link1_File0000000000.bin,/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link2_File0000000000.bin fedId=1,2 inputTrigFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/HgcalBeamtestSep2023/Relay1695803516/Run1695803516_Link0_File0000000000.bin output=Run1695803516_Link2_File0000000000 conditions=default dqmOnly=False runNumber=1695803516 maxEvents=1000000000

in the many parameters what matters is conditions=default

so can you check in the run google table if this run is a DTH run or a standard run?

pfs commented 6 months ago

Another relevant piece of information. The conditions are mapped here DPGAnalysis/HGCalTools/python/tb2023_cfi.py

olgovich commented 6 months ago

Thanks, @pfs ! Could you please point me to the google doc table? I can't find it (I only have this one, which lacks the runs I am interested in: https://docs.google.com/spreadsheets/d/13fuGnuPKuaHTAZSs9Fo84mZQ0OwQThkomTH8M2lKqMQ/edit#gid=1238056000 )

pfs commented 6 months ago

You have this google doc as well.

Also there is a file

/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Oct10/runregistry.csv

which in principle aggregates info from both google docs. If you can check that for those runs it's consistent it's very useful as the runregistry is built before launching the re-reco jobs.

olgovich commented 6 months ago

Thank you, @pfs ! I noticed that you just list 2 runs in this ticket - for these 2 runs the issue I saw was that they have many fewer events in plots for a single channel than total number of entries. These are actually physics default and technical default runs respectively - is this expected or not? These files are large (~1M events).

A related but somewhat different issue brought up on mattermost, is that for the following 3 runs (probably more) have twice as many entries in histogram as they should: 14093956, 14102859, 17145351. Their conditions are pedestal MLFL00041ped, beam MLFL00041 e-, pedestal MLDSL57ped e- respectively. I did not find them in the google docs, but they are documented in the run registry on lxplus. These files are small (10k events)

I am summarising this in an attached table. I can also try running these files, though I don't understand how to read the output root file that is created. Filling files.pdf

pfs commented 6 months ago

Hi @olgovich

AH ok so for the size of these two runs: 1695803516 i see that it is zero suppressed and without absorber in front. So the size should be really small indeed 1695758608 - it's not clear from the google doc if ZS was applied or not (sigh) and should be a short run. Check the google doc

Can you do an histogram of counts per channel in these runs? It should be suppressed all over except for the beamspot channels.

Now for the other runs. They are standalone module runs and in the google doc these files are called differently typically as

run_202309{day}_{hour}{minute}{second} which in CMSSW is turned to a single number as {day}{hour}{minute}{second}

I get the following

Run	Conditions	Google doc info
14093956	MLFL00041ped	`run_20230914_093956` the condition seems correct (MLF = Module Low-density Full) and pedestal
14102859	MLFL00041	`run_20230914_102859` likewise the condition seems OK
17145351	MLDSL57ped	`run_20230917_145351` MLDSL57 condition seems correct (ML*L = Module Low-density Left) and pedestal

The ped added on these runs is a hack for this test beam because the readout was different and was not properly propagated to CMSSW.

Now these events have twice the entries can you run the cmd.log file for one of them until the NANOAOD step is reached and check if the output NANOAOD file still has this problem?

Thanks, Pedro

olgovich commented 6 months ago

Thanks @pfs for the explanations of nomenclature and zero suppression resulting in smaller number of entries in histograms. It really looks to me like both 1695803516 and 1695758608 are zero suppressed.

Run 1695758608 contains 2 NANOAOD files, with 999k and 593k events respectively - this structure and number of events does not match 51 subruns of 10k events, as mentioned in September 2023 HGCal Beam Test Single Module doc. Am I missing something?

I ran the first RECO step for the pedestal run 20230914_093956. Plotting a variable that should have just one entry per event from the output root file gives me twice the number of events in the histogram Events->Draw("HGCalFlaggedECONDInfos_hgcalDigis_UnpackerFlags_TEST.obj.eleid")

So it looks like the issue occurs in the RECO step.

I got a fatal exception when running: ----- Begin Fatal Exception 19-Dec-2023 15:27:10 CET----------------------- An exception of category 'HGCalModuleTreeReader' occurred while [0] Processing Event run: 14093956 lumi: 1 event: 10033 stream: 0 [1] Running path 'p' [2] Calling method for module HGCalSlinkEmulator/'hgcalEmulatedSlinkRawData' Exception Message: Insufficient number of events were retrieved from input tree to proceed with the generation of emulated events. ----- End Fatal Exception ------------------------------------------------- Do you think it is related to the issue?

I am going on holiday tomorrow. Will take a look at the slides from the HGCAL DPG meeting.

pfs commented 6 months ago

Thanks @olgovich

Run 1695758608 contains 2 NANOAOD files, with 999k and 593k events respectively - this structure and number of events does not match 51 subruns of 10k events, as mentioned in September 2023 HGCal Beam Test Single Module doc. Am I missing something?

In the google doc 1695758608 is said to be a short run for debugging. Indeed navigating back to where the raw data is i only see 2 files per link so that matches the final 2 NANOAOD files. I think this is fine. Where do you see 51 subruns of 10k events?

I ran the first RECO step for the pedestal run 20230914_093956. Plotting a variable that should have just one entry per event from the output root file gives me twice the number of events in the histogram Events->Draw("HGCalFlaggedECONDInfos_hgcalDigis_UnpackerFlags_TEST.obj.eleid")

OK so this weird. The conditions seem fine but we should have less entries.

So it looks like the issue occurs in the RECO step.

I got a fatal exception when running: ----- Begin Fatal Exception 19-Dec-2023 15:27:10 CET----------------------- An exception of category 'HGCalModuleTreeReader' occurred while [0] Processing Event run: 14093956 lumi: 1 event: 10033 stream: 0 [1] Running path 'p' [2] Calling method for module HGCalSlinkEmulator/'hgcalEmulatedSlinkRawData' Exception Message: Insufficient number of events were retrieved from input tree to proceed with the generation of emulated events. ----- End Fatal Exception ------------------------------------------------- Do you think it is related to the issue?

This one is "OK". For the moment we have to endure this crash while we don't change the producer. It's annoying but harmless.

I am going on holiday tomorrow. Will take a look at the slides from the HGCAL DPG meeting.

Ok enjoy holidays, i'm trying to figure out in more detail this run.

pfs commented 6 months ago

ok it looks like the list of FEDs being passed to the unpacker is [0,0] instead of [0] making it unpack twice the same thing. It's as dummy as

cmsRun -j FrameworkJobReport_pedestal_run0_roc2root_RECO.xml /afs/cern.ch/work/p/psilva/HGCal/TB2023/rereco/CMSSW_13_2_0_pre3/src/EventFilter/HGCalRawToDigi/test/tb_raw2reco.py mode=hgcmodule fedId=0 slinkBOE=0x2a cbHeaderMarker=0x0 econdHeaderMarker=0x154 ECONDsInPassthrough=0 activeECONDs=0 ECONDsInCharacterisation=0 inputFiles=/eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/BeamTestSep/SingleModuleTest/MLFL00041/pedestal_run/run_20230914_093956/pedestal_run0_roc2root.root fedId=0 output=pedestal_run0_roc2root conditions=MLFL00041ped dqmOnly=False runNumber=14093956 maxEvents=1000000000

i.e. fedId appears twice and the VarParsing tool makes a list of [0,0] instead of complaining of duplicated arguments. I'm making a simple patch for the moment.

pfs commented 6 months ago

As the duplicated entries are not solved in CMSSW but using the run registry tool I think we can close. I opened the issue here and submitted a re-reco of the single module data.

Files will appear here /eos/cms/store/group/dpg_hgcal/tb_hgcal/2023/CMSSW/ReReco_Dec19

CMS-HGCAL / cmssw

Investigate conditions in reconstructing 1695803516 and 1695758608 #94