cmstas / HggAnalysisDev

3 stars 7 forks source link

Implement four-vector tools #9

Open sam-may opened 3 years ago

sam-may commented 3 years ago

It will be useful to have the options to compute four-vector related quantities.

For example, we may want to build an H->TauTau candidate out of two hadronic taus/leptons and compute its pT and eta, compute its deltaR with respect to photons/diphoton, etc.

Given that we already have many useful quantities saved in the skims (gg_pt, gg_eta, SVFit quantities), this is not super urgent, but will be necessary if we want to do things without remaking skims.

I'd suggest we make a PhysicsTools directory inside Preselection and build either a class or a set of functions in e.g. four_vector_utils.py. The main functionality we'd want is the ability to add two four vectors together and then return the resulting four vectors properties (pT, eta, phi, mass). Once we have this, we can do things like compute dR(H->TauTau, H->gg) with existing tools for calculating delta R (these need to be cleaned up as well).

fgolf commented 3 years ago

Hi Sam, I have a LorentzVector class that I wrote in python with students as part of the computational physics class I'm teaching. It probably needs some work but maybe is something to start from. I can point you to it if you think it would be useful.

On Tue, Mar 2, 2021 at 3:39 PM Samuel May notifications@github.com wrote:

It will be useful to have the options to compute four-vector related quantities.

For example, we may want to build an H->TauTau candidate out of two hadronic taus/leptons and compute its pT and eta, compute its deltaR with respect to photons/diphoton, etc.

Given that we already have many useful quantities saved in the skims ( gg_pt, gg_eta, SVFit quantities), this is not super urgent, but will be necessary if we want to do things without remaking skims.

I'd suggest we make a PhysicsTools directory inside Preselection and build either a class or a set of functions in e.g. four_vector_utils.py. The main functionality we'd want is the ability to add two four vectors together and then return the resulting four vectors properties (pT, eta, phi, mass). Once we have this, we can do things like compute dR(H->TauTau, H->gg) with existing tools for calculating delta R (these need to be cleaned up as well).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cmstas/HggAnalysisDev/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA725UXWWZZPXS6NDPLZUETTBVZJTANCNFSM4YP6DD5Q .

-- Best regards, Frank

sam-may commented 3 years ago

Hi Frank, this would definitely be useful, please do point us to it.

aminnj commented 3 years ago

You can also use uproot3_methods:

import numpy as np

# prints warning because of awkward1 vs awkward0, but harmless
import uproot3_methods

# also TLorentzVector (for single p4)
# also accepts jagged arrays (of awkward0 type, so might need to do awkard1.to_awkward0(arr))
# also from_cartesian, ...
p4s = uproot3_methods.TLorentzVectorArray.from_ptetaphim(
    np.array([25., 25., 25.]),
    np.array([0., 0., 0.]),
    np.array([0., 0., 0.]),
    np.array([0., 0., 0.]),
    )

# also boosts and many other things
print((p4s + p4s).pt)
print((p4s + p4s).mass)
print(p4s.delta_phi(p4s))
print(p4s.delta_r(p4s))

and if I understand correctly, the longterm replacement for that is vector, but the readme says it's still under heavy development.

A useful code snippet is https://github.com/aminnj/pdroot/blob/d8b6908e3bffe2c451333efcc119a05929474dc9/pdroot/accessors.py#L61-L73:

@pd.api.extensions.register_dataframe_accessor("p4")
class LorentzVectorAccessor:
    def __init__(self, pandas_obj):
        self._obj = pandas_obj

    def __call__(self, which):
        components = [f"{which}_{x}" for x in ["pt", "eta", "phi", "mass"]]
        missing_columns = set(components) - set(self._obj.columns)
        if len(missing_columns):
            raise AttributeError("Missing columns: {}".format(missing_columns))
        arrays = (self._obj[c] for c in components)
        return uproot3_methods.TLorentzVectorArray.from_ptetaphim(*arrays)

If you do things with pandas, this will add an accessor (like df.mystring.str.split()) for LorentzVectors. That is, if you have columns mu_pt, mu_eta, ... in a pandas DataFrame, you can do df.p4("mu") to get the uproot3_methods.TLorentzVectorArray array, and thus do things like (df.p4("mu1")+df.p4("mu2")).mass. You can of course replace the __call__ with whatever you you want in terms of LV classes.

sam-may commented 3 years ago

This is great, thanks Nick!