Produce LW signal simulation large samples

caredg commented 3 years ago

Based on the files here, run the simulation for large number of events

caredg commented 3 years ago

Timing test

First, I will test the M200DnR point running all steps but with only 10 events and with no RandomSeed module generator to check the timing in test1_M200DnR_10evts_noRandSeed directory:

time cmsRun gensimLW.py > try1.log 2>&1 & It took 9m51s. cross section gives 5.97 fb. time cmsRun hltLW.py > hltLW1.log 2>&1 & The run number module says setting runNumber to: 200519 It took 1m48s time cmsRun recoLW.py > recoLW.log 2>&1 & It took 2m10s There are always these warnings: %MSG-w infos not valid: BTagPerformanceAnalyzerOnData:bTagAnalysis 03-Sep-2021 22:49:39 UTC Run: 200519 Event: 10 A valid SoftLeptonTagInfoCollection was not found! Skipping ratio check. TOTAL TIME: ~ 14min

caredg commented 3 years ago

The corresponding test with 50K events took: gensim: 38m hlt: 6m4s reco: 7m6s TOTAL TIME: 51 m

caredg commented 3 years ago

Test for event repetition

All reco outputs are here, with corresponding labeling.

I made a separate 10-events simulation test in test 1 and 2, just to confirm that events repeat and are exactly the same. Canvas_test1 Canvas_test2

Also confirm that the run numbers for the two tests come out differently. Test one has run number 200519 and test 2 has run number 206859 in this example. So, the line in the hlt.py config seems to do its job. E.g.,

root [1] Events->Scan("EventAuxiliary.id_.run_")
************************
*    Row   * EventAuxi *
************************
*        0 *    206859 *
*        1 *    206859 *
*        2 *    206859 *
*        3 *    206859 *
*        4 *    206859 *
*        5 *    206859 *
*        6 *    206859 *
*        7 *    206859 *
*        8 *    206859 *
*        9 *    206859 *
************************
(long long) 10

The event number starts at 1 and increases. One has to be careful to appropriately assign event numbers when simulating at large scale. This is what the next test is about.

root [3] Events->Scan("EventAuxiliary.id_.event_")
************************
*    Row   * EventAuxi *
************************
*        0 *         1 *
*        1 *         2 *
*        2 *         3 *
*        3 *         4 *
*        4 *         5 *
*        5 *         6 *
*        6 *         7 *
*        7 *         8 *
*        8 *         9 *
*        9 *        10 *
************************
(long long) 10

At the gensim level, the run number sets at 1:

root [1] TFile *_file0 = TFile::Open("gensimLW_test2.root")^C
root [1] Events->Scan("EventAuxiliary.id_.run_")
************************
*    Row   * EventAuxi *
************************
*        0 *         1 *
*        1 *         1 *
*        2 *         1 *
*        3 *         1 *
*        4 *         1 *
*        5 *         1 *
*        6 *         1 *
*        7 *         1 *
*        8 *         1 *
*        9 *         1 *
************************

So, in test 4 I took test 2 results for gensim and remove the line with ThrowAndSetRandomRun in the hlt configurator, just to check how the run numbers are assigned. We will need to precisely control the run and event number to avoid crashes of repeated event numbers in the CMSSW framework.

I commented out these lines:

#setRunNumber = cms.untracked.uint32(206859)

#import SimGeneral.Configuration.ThrowAndSetRandomRun as ThrowAndSetRandomRun                                                                                                   
#ThrowAndSetRandomRun.throwAndSetRandomRun(process.source,[(194533, 5.2999999999999998), (200519, 7.0), (206859, 7.2999999999999998)])

The last line can be done manually, I believe.

Indeed, commenting out those lines, the run numbers are passed from the gensim input:

root [1] TFile *_file0 = TFile::Open("reco10evts_test4.root")^C
root [1] Events->Scan("EventAuxiliary.id_.run_")
************************
*    Row   * EventAuxi *
************************
*        0 *         1 *
*        1 *         1 *
*        2 *         1 *
*        3 *         1 *
*        4 *         1 *
*        5 *         1 *
*        6 *         1 *
*        7 *         1 *
*        8 *         1 *
*        9 *         1 *
************************

caredg commented 3 years ago

Test skipEvents in LHE input and set increasing run numbers

In test5 I add the setRunNumber switch to control the run number input in the LHE source and also set the SkipEVents to zero.
The setRunNumber has to effect in the LHE source. I changed it to firstRun =, and that works. I also added firstEvent = cause it will be nice to have an incremental number of events.

root [2] TFile *_file0 = TFile::Open("reco10evts_test5.root")^C
root [2] Events->Scan("EventAuxiliary.id_.run_")
************************
*    Row   * EventAuxi *
************************
*        0 *    200000 *
*        1 *    200000 *
*        2 *    200000 *
*        3 *    200000 *
*        4 *    200000 *
*        5 *    200000 *
*        6 *    200000 *
*        7 *    200000 *
*        8 *    200000 *
*        9 *    200000 *
************************

root [3] Events->Scan("EventAuxiliary.id_.event_")
************************
*    Row   * EventAuxi *
************************
*        0 *         1 *
*        1 *         2 *
*        2 *         3 *
*        3 *         4 *
*        4 *         5 *
*        5 *         6 *
*        6 *         7 *
*        7 *         8 *
*        8 *         9 *
*        9 *        10 *
************************

For test6 I repeat what I did in test5 but skipping 10 events and making increments for run and run number. It works:

root [1] TFile *_file0 = TFile::Open("reco10evts_test6.root")^C
root [1] Events->Scan("EventAuxiliary.id_.run_")
************************
*    Row   * EventAuxi *
************************
*        0 *    200001 *
*        1 *    200001 *
*        2 *    200001 *
*        3 *    200001 *
*        4 *    200001 *
*        5 *    200001 *
*        6 *    200001 *
*        7 *    200001 *
*        8 *    200001 *
*        9 *    200001 *
************************

root [2] Events->Scan("EventAuxiliary.id_.event_")
************************
*    Row   * EventAuxi *
************************
*        0 *        11 *
*        1 *        12 *
*        2 *        13 *
*        3 *        14 *
*        4 *        15 *
*        5 *        16 *
*        6 *        17 *
*        7 *        18 *
*        8 *        19 *
*        9 *        20 *
************************

Additionally, test5 and test6 give different events as one can see in these plots:

Test5 Test5_recogenjetsPT

Test6 Test6_recogenjetsPT

So the skipping of events seem to work fine. I am ready to test the ntuplizer on these to then run on large scale

caredg commented 3 years ago

Test POET ntuplizer over simulation output

I ran over the test 5 and 6 root output. I made all the appropriate changes in the POET configuration file. I ran as: cmsRun python/poet_cfg.py False True and obtained this error:

----- Begin Fatal Exception 07-Sep-2021 18:03:03 UTC-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing run: 200000 lumi: 1 event: 1
   [1] Running path 'p'
   [2] Calling event method for module PATJetProducer/'patJets'
Exception Message:
Principal::getByLabel: Found zero products matching all criteria
Looking for type: edm::AssociationVector<edm::RefToBaseProd<reco::Jet>,std::vector<float>,edm::RefToBase<reco::Jet>,unsigned int,edm::helper::AssociationIdenticalKeyReference>
Looking for module label: softElectronByPtBJetTags
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------

This is directly related to the errors seen at reconstruction, I believe. I do not know how critical this is. I hope it only affects b-tagging.

Then I ran with: cmsRun python/poet_cfg.py False False and it succeded. The test output can be found here.

I am ready to move to mass production.

caredg commented 3 years ago

Test on HTCondor

I sent 5 jobs for testing mass production in a HTCondor cluster. The initial idea was to send 500 jobs with 20 events each (for a total of 10000 events) for all mass points. I tested the 300 GeV mass point. The quickest job with 20 events took:

gen: ~30 min, the output is 14 MB, keeping all of the gen would cost ~7GB
hlt: 4 min, the output is 24 MB, keeping all of the hlt would cost ~12GB
reco: 6 min, the outpu is 9.2 MB , storing all them would cost ~4.6GB

Keeping all of them, the gen, hlt and reco would cost about ~24GB per mass point, which is not too bad.

As we are pretty close to hit a wall time of 2700 secs, I think I will choose to work with 10 events and send 1000 jobs instead for each mass point. I think it is better if they run quicker. I will also keep the gen the hlt and reco outputs, just in case, and then I can delete the gen and hlt.

As a matter of fact, only one of the jobs finished in time with a wall of 2700 sec. The rest were evicted due to this restriction. The one which finished seemed ok, though.

I am making another test with only 10 events.

With 10 events per job, the jobs completed successfully (with the exception of run number assignment, which was screwed up but I have fixed that). The results of the 5-jobs test can be found in this temporary CERNbox location.

caredg commented 3 years ago

Final production

The first batch of Lee-Wick signal production is ready. Files are stored here.

caredg commented 3 years ago

@JonaJJSJ-crypto found an issue with Z decay. We need to re-run these simulations.....

Jobs have been sent with the updated ..... we will check tomorrow.

caredg commented 3 years ago

Ok, jobs have finished successfully and files seem to be complete. Old directories were deleted and the new ones reside in the same area.

I have created index files for these data and copied them to the data directory in the POET repository. Their names go like: CMSPrivate_MonteCarlo_LWSM200DnR_file_index.txt for masses of 200, 300, 400 and 500.

caredg commented 3 years ago

I will simulate the signal samples again, trying to fix the simulation (see #37). I will try to mimic what was done in this hlt config for an official simulation.

caredg commented 3 years ago

It looks like we can't read from DB for the mixing module as there is some missing information in the GT. The alternative, it seems, is to input a random seed to the secondary source module with these lines:

import random
theseed = random.randint(108967845,1116674398)
print (theseed)
process.mix.input.seed = cms.int32(theseed)

I will also dump a list of MC MinBias files and choose randomly which one to open. The framework is supposed to do this, but it does not seem to do it, do not know why.

caredg commented 3 years ago

Adding correction for decay vertices

I am starting the redo of the simulation. I am adding thisfix fix to the vertex decay and the minbias randomization mentioned above.

I will generate 150K events for each mass point. Due to the expanded sizes, I will only keep gen and reco files for the first mass point if space permits and only the reco for the rest.

caredg commented 3 years ago

Changing the strategy for mass simulation

I will change the final strategy mentioned above. I.e., I will not run over 10 events for each simulation but with 20 events. The reason is that, for the simulation of the 200 GeV mass point, we saw that the processing of 15,000 files with containers in HTCondor is not the best. This is due to a container pull limit. So it is better to wait for the jobs to take close to one hour rather than have them run quick but be filled up with many files. So, for the rest of mass points, I will simulate 150,000 events (as for 200 GeV), but now I will split them into 7,500 jobs instead of 15,000. I will change the scripts accordingly.

On submission, it will be better to split the submission job in two. We saw that this will have higher rate of success.

Merging

In addition, after all simulations are done, the AODSIM files can now be merged using the merge scripts in this repository. The idea is to have much less number of file for the further POET processing.

caredg commented 3 years ago

200 GeV mass point simulations

The latest merged files live here. There are 40 AODSIM merged files weighting 1.6GB each. Each contains 3750 events.

caredg commented 3 years ago

Further mass points are being simulated... but the procedure is for now closed and done.

JonaJJSJ-crypto / Proyecto-de-Tesis