columnflow / columnflow

Backend for columnar, fully orchestrated HEP analyses with pure Python, law and order.
https://columnflow.readthedocs.io
BSD 3-Clause "New" or "Revised" License
25 stars 24 forks source link

Merging not happening between low pT and high pT muons regimes #447

Closed Payal7890 closed 4 months ago

Payal7890 commented 5 months ago

I am seeking a bit of help. Currently I am working on the repository: https://gitlab.cern.ch/dsavoiu/mttbar_trigger_study. So I am attaching my script here. In the analysis muon is coming from the decay t-> b+W-> muon + neutrino + bjets. We are expecting to see a peak at around 41 GeV in the muon pT distribution. Whatever we are seeing is weird. Could this be a problem of the normalisation of MC? It would be of great help if I can get any lead. This is apparent that merging is not happening properly.

# coding: utf-8

"""
Exemplary selection methods.
"""
from collections import defaultdict

from columnflow.selection import Selector, SelectionResult, selector
from columnflow.selection.stats import increment_stats
from columnflow.selection.util import sorted_indices_from_mask
from columnflow.production.processes import process_ids
from columnflow.production.cms.mc_weight import mc_weight
from columnflow.util import maybe_import
from columnflow.columnar_util import optional_column as optional

from tgs.production.categories import category_ids
from tgs.production.features import cutflow_features

np = maybe_import("numpy")
ak = maybe_import("awkward")

#
# unexposed selectors
# (not selectable from the command line but used by other, exposed selectors)
#

@selector(
    uses={
       "Muon.pt", "Muon.eta",
       "Muon.highPtId", "Muon.pfIsoId","Muon.tightId", "HLT.IsoMu27",
      #  "Electron.pt", "Electron.eta", "Electron.deltaEtaSC",
        "HLT.Mu50", optional("HLT.TkMu100"), optional("HLT.OldMu100"),  # Added TkMu100 and OldMu100 triggers
       # "HLT.Ele35_WPTight_Gsf",
       # "Electron.mvaFall17V2Iso_WP80", "Electron.mvaFall17V2noIso_WP80",
    },
)
def muon_selection(
    self: Selector,
    events: ak.Array,
    **kwargs,
) -> tuple[ak.Array, SelectionResult]:

    # filter valid muons by kinematics and ID (per-muon mask)
    muon_mask_high_pt = (events.Muon.pt > 55.0) & (abs(events.Muon.eta) < 2.4) & (events.Muon.highPtId == 2) &  (
    events.HLT.Mu50 |  # Use '|' for OR combination
    getattr(events.HLT, "TkMu100", True) |  # Added TkMu100 trigger
    getattr(events.HLT, "OldMu100", True)  # Added OldMu100 trigger
)   #CutBasedIDGlobalHighpT

    # filter valid muons by kinematics and ID (per-muon mask)
    muon_mask_low_pt = (events.Muon.pt > 30.0) & (abs(events.Muon.eta) < 2.4) & (events.Muon.pt <= 55.0) & (events.Muon.pfIsoId == 4) & (events.Muon.tightId) & (events.HLT.IsoMu27)

    muon_mask = (
        muon_mask_high_pt | muon_mask_low_pt
    )

    muon_indices = sorted_indices_from_mask(muon_mask, events.Muon.pt)

    # require the electron trigger to calculate efficiencies
    # NOTE: not needed here for baseline selection -> use categories
    # electron_trigger_sel = events.HLT.Ele35_WPTight_Gsf

    # build and return selection results
    # "objects" maps source columns to new columns and selections to be applied on the old columns
    # to create them, e.g. {"Muon": {"MySelectedMuon": indices_applied_to_Muon}}

    return events, SelectionResult(
        steps={
            "muon": (ak.num(muon_indices) == 1),
        },
        objects={
            "Muon": {
                "Muon": muon_mask,
            },
        },
    )

@selector(
    uses={
     #    "Muon.pt", "Muon.eta",
     #   "Muon.highPtId", "HLT.IsoMu27"
        "Electron.pt", "Electron.eta", "Electron.deltaEtaSC",
     #   "HLT.Mu50", optional("HLT.TkMu100"), optional("HLT.OldMu100"),  # Added TkMu100 and OldMu100 triggers
        "HLT.Ele35_WPTight_Gsf",
        "Electron.mvaFall17V2Iso_WP80", "Electron.mvaFall17V2noIso_WP80",
    },
)
def electron_selection(
    self: Selector,
    events: ak.Array,
    **kwargs,
) -> tuple[ak.Array, SelectionResult]:

    # filter valid electons by kinematics
    electron_mask = (
        (events.Electron.pt > 120) &
        (abs(events.Electron.eta + events.Electron.deltaEtaSC) < 2.5) &
        (events.Electron.mvaFall17V2Iso_WP80)
    )
    electron_indices = sorted_indices_from_mask(electron_mask, events.Electron.pt)

    # require the electron trigger to calculate efficiencies
    # NOTE: not needed here for baseline selection -> use categories
    # electron_trigger_sel = events.HLT.Ele35_WPTight_Gsf

    # build and return selection results
    # "objects" maps source columns to new columns and selections to be applied on the old columns
    # to create them, e.g. {"Muon": {"MySelectedMuon": indices_applied_to_Muon}}

    return events, SelectionResult(
        steps={

            "electron": (ak.num(electron_indices) == 1),
        },
        objects={

            "Electron": {
                "Electron": electron_mask,
            },
        },
    )
@selector(
    uses={"Jet.pt", "Jet.eta", "Jet.btagDeepFlavB",},
)
def jet_selection(
    self: Selector,
    events: ak.Array,
    **kwargs,
) -> tuple[ak.Array, SelectionResult]:

    # only consider jets within a certain kinematic range
    jet_mask = (events.Jet.pt >= 25.0) & (abs(events.Jet.eta) < 2.4)

     # B-tagging working point
    wp_med = self.config_inst.x.btag_working_points.deepjet.medium

    # B-tagged and light jets
    bjet_mask = (jet_mask) & (events.Jet.btagDeepFlavB >= wp_med)
    sel_bjet = ak.sum(bjet_mask, axis=-1) >= 1

    # Indices of b-tagged and non-b-tagged (light) jets
    bjet_indices = sorted_indices_from_mask(bjet_mask, events.Jet.pt)

    # get kinematics for first two jets (fill none if missing)
    jet_pt = ak.pad_none(events.Jet.pt, 2)
    jet_eta = ak.pad_none(events.Jet.eta, 2)

    # select events where two leading jets pass criteria
    jet_sel1 = (jet_pt[:, 0] >= 50.0) & (abs(jet_eta[:, 0]) < 2.5)
    jet_sel2 = (jet_pt[:, 1] >= 50.0) & (abs(jet_eta[:, 1]) < 2.5)

    # build and return selection results
    # "objects" maps source columns to new columns and selections to be applied on the old columns
    # to create them, e.g. {"Jet": {"MyCustomJetCollection": indices_applied_to_Jet}}
    return events, SelectionResult(
        steps={
            "jet1": ak.fill_none(jet_sel1, False),
            "jet2": ak.fill_none(jet_sel2, False),
            "bjet": sel_bjet,
        },
        objects={
            "Jet": {
                "Jet": sorted_indices_from_mask(jet_mask, events.Jet.pt, ascending=False),
                "BJet": bjet_indices,
            },
        },
        aux={
            "n_jets": ak.sum(jet_mask, axis=1),
        },
    )
#Missing Transverse Energy Cuts
@selector(
    uses={"MET.pt",},
)
def met_selection(
    self: Selector,
    events: ak.Array,
    **kwargs,
) -> tuple[ak.Array, SelectionResult]:   
    met_sel = events.MET.pt > 70

    return events, SelectionResult(
        steps={
            "MET": met_sel,
        },
        objects= {},
    )

# exposed selectors
# (those that can be invoked from the command line)
#

@selector(
    uses={
        # selectors / producers called within _this_ selector
        mc_weight, cutflow_features, process_ids, muon_selection, electron_selection, jet_selection,met_selection,
        category_ids,
        increment_stats,
    },
    produces={
        # selectors / producers whose newly created columns should be kept
        mc_weight, cutflow_features, process_ids,
        category_ids,
    },
    exposed=True,
)
def default(
    self: Selector,
    events: ak.Array,
    stats: defaultdict,
    **kwargs,
) -> tuple[ak.Array, SelectionResult]:
    # prepare the selection results that are updated at every step
    results = SelectionResult()

    # muon selection
    events, muon_results = self[muon_selection](events, **kwargs)
    results += muon_results

    # electron selection
    events, electron_results= self[electron_selection](events, **kwargs)
    results += electron_results

    # jet selection
    events, jet_results = self[jet_selection](events, **kwargs)
    results += jet_results

    # met selection
    events, met_results = self[met_selection](events, **kwargs)
    results += met_results

    # combined event selection after all steps
    results.event = (
        results.steps.muon &
        results.steps.electron &
        results.steps.jet1 &
        results.steps.jet2 &
        results.steps.bjet &
        results.steps.MET
    )

   # create process ids
    events = self[process_ids](events, **kwargs)

    # add the mc weight
    if self.dataset_inst.is_mc:
        events = self[mc_weight](events, **kwargs)

    # build categories
    events = self[category_ids](events, results=results, **kwargs)

    # add cutflow features, passing per-object masks
    events = self[cutflow_features](events, results.objects, **kwargs)

    # increment stats
    weight_map = {
        "num_events": Ellipsis,
        "num_events_selected": results.event,
    }
    group_map = {}
    if self.dataset_inst.is_mc:
        weight_map = {
            **weight_map,
            # mc weight for all events
            "sum_mc_weight": (events.mc_weight, Ellipsis),
            "sum_mc_weight_selected": (events.mc_weight, results.event),
        }
        group_map = {
            # per process
            "process": {
                "values": events.process_id,
                "mask_fn": (lambda v: events.process_id == v),
            },
            # per jet multiplicity
            "njet": {
                "values": results.x.n_jets,
                "mask_fn": (lambda v: results.x.n_jets == v),
            },
        }

    events, results = self[increment_stats](
        events,
        results,
        stats,
        weight_map=weight_map,
        group_map=group_map,
        **kwargs,
    )

    return events, results

edit by riga: removed image

riga commented 5 months ago

Hi @Payal7890 ,

(I put the code you pasted above into triple-backtick code block for better readability)

Comparing your code and the plot, nothing looks wrong to me per-se. If you expected a larger fraction of low pt muons w.r.t. high pt ones, I would suggest to add an interactive debugger / shell with from IPython import embed; embed(header="debugging muon cuts"), right below the line in which you define your muon_mask variable, and check with ak.mean(muon_mask_{low,high}_pt) of the acceptance rates match your expectation.


Side note, are you sure the trigger selection for the high pt muons is correct? Right now, if either of the TkMu100 or OldMu100 triggers is missing in NanoAOD, the trigger selection is succeeds, independent of the Mu50 trigger decision.


Second site note, could you please add the plot to (e.g.) CERNbox, attach the proper cms-access permissions to it, and then share the link? I'm not sure about the publication status of your work, but in general we would like to avoid having potentially unreviewed plots with data in openly accessible areas.

Payal7890 commented 4 months ago

Hello, thank you very much for your response. Our trigger selection in the high pT regime is just the trigger OR of Mu50, OldMu100 and TkMu100. For the run2_2017, some of the trigger paths (OldMu100 and TkMu100) are missing in some datasets. So, for those datasets (where trigger paths seem to be missing), our events should pass Mu50 trigger only. That's what I was trying to implement. How should I do that?

riga commented 4 months ago

The construct

events.HLT.Mu50 | getattr(events.HLT, "SOME_MISSING_TRIGGER", True)

will always evaluate to True when SOME_MISSING_TRIGGER is in fact missing, independent of the Mu50 trigger. I'm pretty sure that's not what you want, and a simple switch to False for the default value in getattr should do the trick.

riga commented 4 months ago

I closed the issue since it seems to be purely analysis related. Feel free to reopen in case you observe an issue in columnflow though :)

Payal7890 commented 4 months ago

Hello, thank you very much for your response. However, I am going through another issue. Right now I am running something using mttbar repository (https://github.com/uhh-cms/mttbar/blob/master/mtt/config/config_2017.py). Every time I submit a job, the framework is complaining about not finding the path to this: FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/condor/execute/dir_76622/job_cYcGKGzo0bMf/repo/data/json/l1_prefiring.json'

But the file exists here: /nfs/dust/cms/user/payalroy/mttbar/data/json/l1_prefiring.json

I added the path specifically to the script, even after that getting the same error message: /nfs/dust/cms/user/payalroy/mttbar/mtt/selection/lepton.py

What would be the next way out to solve this? Here is the link to a output log file: /nfs/dust/cms/user/payalroy/mttbar/data/mtt_store/analysis_mtt/cf.CalibrateEvents/run2_2017_nano_v9/tt_dl_powheg/nominal/calib__skip_jecunc/v4/stdall_0To1.txt

Not sure, why the path is messed up.