Open kpedro88 opened 2 years ago
A new Issue was created by @kpedro88 Kevin Pedro.
@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
type jetmet
assign reconstruction
assign reconstruction
New categories assigned: reconstruction
@jpata,@slava77,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks
Hi @kpedro88, this sounds interesting. However, the sporadic crashes would probably have to be overcome, otherwise it might do more harm than good? @laurenhay , what do you think?
@kirschen to clarify, the sporadic crashes only happened in my first attempt (persisting FastJet objects). The second attempt, while it requires more rearranging (and the code probably still needs some further work), performs flawlessly.
type performance-improvements
I realized recently that running a jet grooming algorithm (e.g. SoftDrop) via
FastjetJetProducer
requires rerunning the jet area calculation and base clustering of constituents. This amounts to a significant consumption of CPU in analysis workflows (such as those using JetToolbox), which could be avoided with a different design.My first attempt at such a design change involved putting FastJet objects directly into the event to be reused later: https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_X...kpedro88:ReusePseudoJets106X. Pros: less "invasive" (existing producers in official sequences/workflows mostly stay the same, with just a few extra parameters) Cons: sporadically crashes, likely because some component of the FastJet objects does not have a long enough lifetime and can get deleted/overwritten (was not immediately able to determine a precise source of the problem)
My second attempt involves reworking
FastjetJetProducer
to run the base clustering and any transforms (grooming algorithms) all at the same time. This is working well in my ongoing tests: the output agrees with the baseline and the expected speedup is realized. The code and logic need some further improvement to ensure all cases are handled correctly and sensibly. Right now, I am using a Python function to adapt existing sequences to the new approach withEDAlias
employed to avoid having to change downstream modules. For central implementation, the underlying configs and workflow setups should probably be adjusted directly. The current state of the code can be found here: https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_X...kpedro88:OneShotTransforms106XA basic test using JetToolbox to recluster AK8 jets with no pT cut and run SoftDrop shows how a large speedup can be achieved: Before:
After:
I also put together a miniAOD test. The total speedup is about 2% of the workflow runtime. Before:
After:
Instructions to reproduce the tests can be found at https://gist.github.com/kpedro88/7198d46c018362e2aae5ffbcfb8d6c11.
It is also possible that this approach could reduce HLT latency, since both regular clustering and trimming are performed for AK8 jets. (It depends on how many paths include AK8 jets but not trimming, which needs further investigation.)
I would like to know if anyone from @cms-sw/jetmet-pog-l2 is interested in pursuing this for official use.