SUEPPhysics / SUEPCoffea_dask

SUEP analysis using coffea with fastjet. Uses Dask for batch submissions
3 stars 13 forks source link

Optimize selections (W pT/mT, MET, deltaPhi, balancing variable, ...) #301

Closed lucalavezzo closed 4 months ago

lucalavezzo commented 6 months ago

For now we set both to 50 GeV, but should optimize them. The question is really how do we want to evaluate that. A significance scan across all backgrounds?

jreic commented 6 months ago

Agreed! We should optimize mT as well. A significance scan (can coarsely scan, maybe in 10 GeV steps) would be a good way to go. One question might be the order of optimizing these vs. the other variables, e.g. if we apply tight lepton ID here, we'll be less reliant on QCD when evaluating the significance of a given mT/MET/pT selection. Same if we apply bjet vetoes here, to make us less reliant on ttbar when evaluating the significances.

lucalavezzo commented 6 months ago

yea agreed, we will do this after the b jets and the lepton selection are defined. Thank you!

jreic commented 6 months ago

Should also pick a figure of merit to optimize with: https://twiki.cern.ch/twiki/bin/view/CMS/FigureOfMerit

The Asimov significance is pretty easy math, and approximates S/sqrt(B) (which is good for optimizing for discovery) without breaking down for B->0 (though that isn't actually an issue for us when looking inclusively in nConst)

lucalavezzo commented 6 months ago

Seems like a good choice. The only last question I had for this was whether to check the yields for everything that passes our selections, or all the selections plus the SR ones as defined by the ABCD regions, i.e. with some cut on nconst and cluster pT.

jreic commented 6 months ago

Good question! cluster pT > 60 GeV is already built in to our ntuples, so I'm not sure if you mean a tighter one or a lead (AK4) jet pT requirement. Ultimately, I think we do want to see that we can extract signal from the background in that region--maybe there is still some wiggle room on the exact SR defintion still, so I could see pros and cons with either choice. To make life harder, I would do it both ways and see if the optimal choices even differ.

lucalavezzo commented 6 months ago

I just mean that, like the ZH defines a SR for the ABCD background, labeled A below, we will likely have some SR defined in the same way, and the sensitivity metric we choose should be evaluated there more heavily than e.g. in B2 or E1.

image

jreic commented 6 months ago

Right--A is where it matters the most, so checking the sensitivity there is logical, and is probably where we should do the optimization. I was partly thinking that we wanted to optimize Nconst and jet pT cuts as well, but we should probably keep those aligned with ZH, now that I think about it more. So let's plan to optimize these MET/pT/mT cuts based on region A, but I think a check of the significances when using the full ABCD plane is still worthwhile. If the optimal significance is very different in the full ABCD region vs. in the A region, there could be some correlation that we are learning about.

lucalavezzo commented 6 months ago

Just for reference, we agreed during meeting not to over-engineer this. We will check a couple of representative samples, and optimize wrt those.

lucalavezzo commented 5 months ago

Currently studying selections on:

The idea at the moment is to pick some physics-inspired values as benchmark, and then try to optimize a bit on those. Don't over-tune to any signal sample.

lucalavezzo commented 4 months ago

We have settled on some selections for now; if we need to optimize further, we will.