cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

Towards a transient-persistent layer to read NanoAOD in RDataFrame #45972

Open lenzip opened 2 months ago

lenzip commented 2 months ago

It would be desirable to be able to write analysis in RDataFrame treating the nanoAOD as a collection of objects, rather than dealing with the individual branches, mainly to reduce bolierplate, e.g.:

This is already implemented in some tools that use RDataFrame as a backend. For example in Bamboo this is implemented in a python layer that then translates under the hood in RDataFrame actions on individual columns.

Ideally one would like to be able to build, e.g., an RVec of Muon objects from the individual Muon_attributes RVecs, with something like:

df = df.Define("Muons", "some_function_to_build_muons")

Requirements:

  1. it should allow using the . operator, i.e. one should be able to do Muons[0].pt or Muon[0].pt(), rather than Muon_pt[0].
  2. The reading of branches should be lazy, i.e. only happen if required. In other words one wants to avoid a performance penalty in reading all attributes of muons from the file, if one only uses a few in the analysis.

ROOT does offer a ROOT::VecOps::Construct function that allows building custom objects, e.g. the following code builds a RVec of muon 4-momenta:

import ROOT

df = ROOT.RDataFrame("Events", "root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root")
ROOT.RDF.Experimental.AddProgressBar(df);

df = df.Define("Muon_p4", "ROOT::VecOps::Construct<ROOT::Math::PtEtaPhiMVector>(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

histo = df.Define("Muon_mt", "return Map(Muon_p4, [](auto v) {return v.Mt();})").Histo1D(("new", "new", 100, 0., 100.), "Muon_mt")

histo.Draw()

This achieves 1. above, but not 2., i.e. it is not lazy, all branches are accessed to pass arguments to the ROOT::Math::PtEtaPhiMVector constructor.

Status:

ROOT people have already discussed this topic with us and in a ROOT PPP meeting. Afterwards they have given us pointers to the use of ROOT::VecOps::Construct function described in the example above. We only recently mentioned requirement 2. (lazy branch read in this objectification layer), and they mentioned they are thinking of a possible solution.

cmsbuild commented 2 months ago

cms-bot internal usage

cmsbuild commented 2 months ago

A new Issue was created by @lenzip.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

lenzip commented 2 months ago

I am posting this here as requested yesterday at the core software meeting, although this does not concern cmssw specifically, just ROOT

makortel commented 2 months ago

type root

makortel commented 2 months ago

assign analysis

I feel this is the closest match, even if this issue doesn't concern CMSSW directly.

makortel commented 2 months ago

Thanks @lenzip!

cmsbuild commented 2 months ago

New categories assigned: analysis

@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks