cms-nanoAOD / nanoAOD-tools

Tools for working with NanoAOD (requiring only python + root, not CMSSW)
42 stars 326 forks source link

Optimisation of jetmetUncertainties #216

Open eshwen opened 4 years ago

eshwen commented 4 years ago

When trying to run over datasets (in particular the large files) on CRAB, using central nanoAOD-tools modules, jobs are often failing due to hitting their wall clock times. The main offender seems to be jetmetUncertainties.py, as this performs a lot of looping over the jets and uncertainties in a given event.

Would it be possible to optimise this (perhaps using vectorisation and tools from numpy/scipy) so the module can run faster and reduce the frequency at which the jobs fail? This is starting to become a bottleneck for our analysis because of the length of time it takes to fully run over a dataset successfully.

pieterdavid commented 4 years ago

For bamboo (an RDataFrame-based analysis framework) I wanted to have the option to calculate the jet variations on-demand (and skip the postprocessing step), so I made a C++ implementation of jetmetUncertainties (calling python code from RDataFrame is a recent feature, and quite slow). The fat jet variations are not there yet, but AK4PFchs jets and the Type-1 MET correction match with NanoAOD-tools within numerical precision for the tested configurations. The implementation code is in this file, and the build config here (some files are copied from CMSSW, otherwise the only dependencies are a recent ROOT and Boost, so it should be relatively straightforward to include - at least locally for comparing the speed).