HEP-KBFI / TallinnNtupleProducer

code, python scripts and config files for producing "plain" Tallinn Ntuples
3 stars 2 forks source link

Load JEC, JER and JMAR corrections from JSON #20

Closed ktht closed 2 years ago

ktht commented 2 years ago

This PR is the last step needed to phase out nanoAOD-tools. Instead of adding numerous branches for storing corrected jets and MET with nanoAOD-tools, we would read vanilla NanoAOD directly and apply the necessary corrections on-the-fly. It is faster and more flexible compared to the equivalent task done by nanoAOD-tools, plus it also eliminates the need to store intermediate NanoAOD files and thereby reduces the human time spent on managing the said files.

The only downside is that the runtime approximately doubles to ~1k~ 100 events / second. (Although I have to point out that my previous benchmarks prior to this PR did not save the event weight branches nor split JER uncertainties, so the real increase in runtime is less than double.) This is not a huge deal because the total runtime of processing multiple years' worth of datasets still remains reasonable. If we have, say, 20B events to process, it would take about ~230~ 2300 CPU-days to get the job done (per channel). The runtime could be reduced by considering fewer systematic uncertainties when producing Ntuples for other purposes than datacard production (such as auxiliary measurement or training DNNs/BDTs). The memory consumption stays at around 1 GB (well below the designated limit of 2 GB) and stable. Below is a plot that shows the memory consumption for processing 400k NanoAODv7 HH events. The last "jump" is caused by writing the output file (it's present even when processing 1k events).

PrMon_wtime_vs_vmem_pss_rss_swap

Since this PR is massive, I'll try to summarize the major points:

Even though the implementation is very similar to that of nanoAOD-tools, there are still some deliberate differences:

Here are some open points:

I think these issues can be resolved on a different time scale, and do not necessarily need to be addressed in this PR.

ktht commented 2 years ago

Merging because my other development branch is already conflicting with this one. I don't think we ever need to revert these changes anyways.