cms-tau-pog / TauFW

Analysis framework for tau analysis at CMS using NanoAOD
9 stars 40 forks source link

Implement RDataFrame in Sample and SampleSet for plotting #56

Closed IzaakWN closed 5 months ago

IzaakWN commented 7 months ago

Implement ROOT's RDataFrame to address https://github.com/cms-tau-pog/TauFW/issues/51

Motivation

Implementation

New container classes for output

New routines

Changes to existing code

Tools

Validation

Plans

  1. Will leave this PR as an open draft until it's completely validated.
  2. Implement possibility for 2D histograms via Sample.getrdframe and using RDataFrame.Histo2D.
  3. Will completely remove the old (Merged)Sample.gethist and SampleSet.gethists routines that relied on MultiDraw and python multithreading, and replace it with class methods of the same name that use RDataFrame.
  4. Further clean the TauFW plotting code.
  5. Discuss in the TauPOG here: https://indico.cern.ch/event/1358491/#3-plans-status-of-taufw
IzaakWN commented 6 months ago

Updates

Plans

I think this PR is mostly done. What's left is testing with a "real" example, comparing the plots (by plot*.py) and datacard inputs (ROOT files from createcards*.py) between this branch and the current master. If the results are consistent, and there are no bugs, we can merge this PR and create a new release version of the TauFW.

Note that the new (Merged)Sample.gethist(2D), SampleSet.gethists, and SampleSet.getstack methods using RDataFrame should be implemented such that the user does not notice a difference in the output (even though multiple selections are now allowed). This means that user scripts do not need to be updated, unless they want to parallelize over multiple selections.

IzaakWN commented 5 months ago

Changed the target branch to hackathon, which will be the development branch during the CAT Hackathon (2/2024): https://gitlab.cern.ch/groups/cms-tau-pog/-/epics/1