cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.32k forks source link

Opportunity to speed up FastjetJetProducer #37827

Open kpedro88 opened 2 years ago

kpedro88 commented 2 years ago

I realized recently that running a jet grooming algorithm (e.g. SoftDrop) via FastjetJetProducer requires rerunning the jet area calculation and base clustering of constituents. This amounts to a significant consumption of CPU in analysis workflows (such as those using JetToolbox), which could be avoided with a different design.

My first attempt at such a design change involved putting FastJet objects directly into the event to be reused later: https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_X...kpedro88:ReusePseudoJets106X. Pros: less "invasive" (existing producers in official sequences/workflows mostly stay the same, with just a few extra parameters) Cons: sporadically crashes, likely because some component of the FastJet objects does not have a long enough lifetime and can get deleted/overwritten (was not immediately able to determine a precise source of the problem)

My second attempt involves reworking FastjetJetProducer to run the base clustering and any transforms (grooming algorithms) all at the same time. This is working well in my ongoing tests: the output agrees with the baseline and the expected speedup is realized. The code and logic need some further improvement to ensure all cases are handled correctly and sensibly. Right now, I am using a Python function to adapt existing sequences to the new approach with EDAlias employed to avoid having to change downstream modules. For central implementation, the underlying configs and workflow setups should probably be adjusted directly. The current state of the code can be found here: https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_X...kpedro88:OneShotTransforms106X

A basic test using JetToolbox to recluster AK8 jets with no pT cut and run SoftDrop shows how a large speedup can be achieved: Before:

TimeReport   0.000597     0.000597     0.000597  ak8GenJetsNoNu
TimeReport   0.000635     0.000635     0.000635  ak8GenJetsNoNuSoftDrop
TimeReport   0.007783     0.007783     0.007783  ak8PFJetsPuppiNoCut
TimeReport   0.007811     0.007811     0.007811  ak8PFJetsPuppiNoCutSoftDrop

After:

TimeReport   0.000814     0.000814     0.000814  ak8GenJetsNoNu
TimeReport   0.008197     0.008197     0.008197  ak8PFJetsPuppiNoCut

I also put together a miniAOD test. The total speedup is about 2% of the workflow runtime. Before:

TimeReport   0.007883     0.007883     0.007883  ak8PFJetsPuppi
TimeReport   0.001518     0.001518     0.001518  ak8PFJetsPuppiSoftDrop
TimeReport   0.000150     0.000150     0.000150  ak8PFJetsCHSPruned
TimeReport   0.006020     0.006020     0.006020  ak8PFJetsCHSSoftDrop

After:

TimeReport   0.008028     0.008028     0.008028  ak8PFJetsPuppi
TimeReport   0.000224     0.000224     0.000224  ak8PFJetsCHSCore

Instructions to reproduce the tests can be found at https://gist.github.com/kpedro88/7198d46c018362e2aae5ffbcfb8d6c11.

It is also possible that this approach could reduce HLT latency, since both regular clustering and trimming are performed for AK8 jets. (It depends on how many paths include AK8 jets but not trimming, which needs further investigation.)

I would like to know if anyone from @cms-sw/jetmet-pog-l2 is interested in pursuing this for official use.

cmsbuild commented 2 years ago

A new Issue was created by @kpedro88 Kevin Pedro.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

kpedro88 commented 2 years ago

type jetmet

kpedro88 commented 2 years ago

assign reconstruction

makortel commented 2 years ago

assign reconstruction

cmsbuild commented 2 years ago

New categories assigned: reconstruction

@jpata,@slava77,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

kirschen commented 2 years ago

Hi @kpedro88, this sounds interesting. However, the sporadic crashes would probably have to be overcome, otherwise it might do more harm than good? @laurenhay , what do you think?

kpedro88 commented 2 years ago

@kirschen to clarify, the sporadic crashes only happened in my first attempt (persisting FastJet objects). The second attempt, while it requires more rearranging (and the code probably still needs some further work), performs flawlessly.

jpata commented 2 years ago

type performance-improvements