cms-nanoAOD / cmssw

CMS NanoAOD software integration repository
http://cms-sw.github.io/
Apache License 2.0
3 stars 10 forks source link

JMEnano oversized #549

Closed mariadalfonso closed 3 years ago

mariadalfonso commented 4 years ago

I did a check of the JMEnano size out of the workflow 11024.15 already in production. 11024.15 uses MC ttbar events from 2018 with PU. The JME nano original goal is to derive JEC. It was implicitly redesigned to derive most of the SF for JME and BTV. [Note this does not contains any of the new JME desired PF candidates or Jet constituents or anything that is needed to do developments for Run3]

Test done on CMSSW_11_2_0_pre7 with 500 events http://dalfonso.web.cern.ch/dalfonso/XPOG/11_2_0_pre7/jme-11024.15_size_report.html We have now 13.69 kb/event.


In comparison the central nano in the similar high PU conditions is < 2kb/ev test done with /eos/cms/store/relval/CMSSW_11_2_0_pre7/RelValTTbar_13/NANOAODSIM/PU25ns_112X_upgrade2018_realistic_v3-v1/20000/638FCD01-8C5E-0B41-8240-65FF1A67B342.root

http://dalfonso.web.cern.ch/dalfonso/XPOG/11_2_0_pre7/centralNano_size_report.html

mariadalfonso commented 4 years ago

JME experts reported similar size on TTbar events

https://github.com/cms-sw/cmssw/pull/31714#issuecomment-709308184 https://github.com/cms-sw/cmssw/pull/31831#issuecomment-711030544

mariadalfonso commented 3 years ago

Summarize the review as of Monday 19 October.

JME stores 3 AK4jets (PF, CHS, Puppi) + another set of AK4chs CorrT1METJet duplicate in the main collection. Those 4 collections are about 80% of the jmenano as 11024.15

  1. an average 80 items/evt for PF and CHS but only 11 items/evt per event for Puppi . what is the reason of the asymmetry? is the size of PF/CHS large of the Puppi too little ?

  2. from JMAR "PFjets with the same content is usually a good check" , does this means that the PFjets are not really necessary ? suggested to drop

  3. discriminators such as btagging/particlenet are stored also at very low PT but will be only used from 20-30 GeV suggested to tailor the content to derive the needed SF

mariadalfonso commented 3 years ago

some small suggestion:

  1. drop CorrT1METJet for type1MET you already have all possible jets in the events in the other collections

  2. eta/phi jets have larger precision of the pt and are by far the most offending floats jet eta/phi are saved with high precision ~ 12 as leptons while the pt are redefined with precision 10

  3. drop the regression variables you have now in the jet-chs. something like this should work getattr(proc,jetTable).externalVariables = cms.PSet()

nurfikri89 commented 3 years ago

PR #32722 will make changes to reduce JMEnano size. The event size is now reduced to 6.42 kb/event from 9.32 kb/event. The comparison was made using 10K events from a TTJets RunIISummer19UL17MiniAOD sample. The changes were discussed in the 13/01/2021 XPOG meeting [1]. One change discussed during the meeting but not included in the PR is reducing eta and phi precision from 12 to 10. It was found that the size reduction is negligible so it was decided to not reduce the eta and phi precision.

[1] https://indico.cern.ch/event/978436/

nurfikri89 commented 3 years ago

PR cms-sw#32722 and its backports (#32759 for 10_6_X and #32760 for 11_2_X) have been merged

mariadalfonso commented 3 years ago

closing this issue following the mentioned PRs effectively reduce the size