Closed jrueb closed 3 years ago
It's in the key name:
{taggerName}_{workingPoint}_{sfTechnique}_{systematic}_{jetType}
where working point is an enum of loose, medium, tight, and jetType is 0=b 1=c 2=udcsg. I might ask @dnoonan08 to confirm these are the indices.
That said, clearly one wants to vectorize the evaluation over the jet hadronFlavor
column, so perhaps this should be changed. For now, one can evaluate each separately and use in-place masked assignment to collate the results. Note if these are jagged arrays, its a bit more complicated, here's an example
bJetSF = evaluator['btag%iDeepCSV_1_comb_central_0'%year](tightJets.eta, tightJets.pt, tightJets.btag)
bJetSF_c = evaluator['btag%iDeepCSV_1_comb_central_1'%year](tightJets.eta, tightJets.pt, tightJets.btag)
bJetSF_udcsg = evaluator['btag%iDeepCSV_1_incl_central_2'%year](tightJets.eta, tightJets.pt, tightJets.btag)
bJetSF.content[(tightJets.hadFlav==4).content] = bJetSF_c[tightJets.hadFlav==4].content
bJetSF.content[(tightJets.hadFlav==0).content] = bJetSF_udcsg[tightJets.hadFlav==0].content
It's in the key name:
{taggerName}_{workingPoint}_{sfTechnique}_{systematic}_{jetType}
Besides the flavor evaluation not being vectorized, I also think it is very suboptimal to concatenate all the information into one long string. It only works if you know exactly what you're looking for. If that's not the case, you're required to construct a workaround with regular expressions or something similar.
For example, I can not be sure what taggerName
is, especially after #207. Then there are CSV files containing working points, other versions of the same scale factors CSV file don't, thus I don't always know what workingPoint
is. The same can hold true for sfTechnique
.
Additionally, if one wants to use multiple working points or systematics, one will be forced to format a new string for every combination.
I think it would be very beneficial if one could use each key part individually, check whether it is present and access it. I think it could be solved with a multidimensional index, tuple indexing or simply more specific methods.
OK - since lookup tools is supposed to be incredibly generic, it sounds like this probably needs a layer similar to jetmet_tools for the JECs and such. It sounds like what needs to be kept around more is
It would help to have a use case and standard workflow to try to better understand what's useful. I haven't had to use the b-tag SFs myself so I don't know the most effective practices.
It may not be able to fix vectorization very easily. Of course some sugar over the jet flavor indices can be done, assuming people will always have a column of jet types hanging around. It looks like for a good fraction of the b-tag SF function is repeated for all the variations with some additional offset or minor variation. However, I'm not sure if that can be generalized to all btagging scale factors. If they'd stick to a single functional form it'd help, but alas....
Feel free to contribute improvements if you want this to move faster.
@jrueb does this tool satisfy your needs? https://coffeateam.github.io/coffea/api/coffea.btag_tools.BTagScaleFactor.html (in particular the eval
function) If so, we can close this issue I think.
Sorry for the late reply. BTagScaleFactor
looks really good. I have a small request though. Would it be possible to have the systematic
parameter of eval
and __call__
be split up into jet flavor, so that for example it becomes possible to set the correction to "up" only for light flavor jets? In my analysis I have to treat systematics from light jets independently.
@lgray that's exactly the division I think we should maintain going forward: lookup_tools
is generic, and then the object-specific tools that are a bit easier to use go in btag_tools
, jetmet_tools
, etc.
@jrueb would doing something like this work for you?
sf = btag_sf.eval("central", events.Jet.hadronFlavour, abs(events.Jet.eta), events.Jet.pt)
sf_up = btag_sf.eval("up", events.Jet.hadronFlavour, abs(events.Jet.eta), events.Jet.pt)
sf_up_light = ak.where(events.Jet.hadronFlavour<4, sf_up, sf)
The nice thing is then you only run the SF evaluation (somewhat expensive) once and then separate it later.
A note on memory performance: better to use ak.where(events.Jet.hadronFlavour<4, sf_up, sf)
in downstream formulas to save space on temporary arrays
@nsmith- That looks good. Thank you!
Describe the bug The coffea evaluator for CMS BTV weights depends on pt, eta and discriminator value, while it should also depend on jet flavor.
To Reproduce To reproduce, one can use the following
It prints the string "3 dimensional histogram", leaving no room for a jet flavor dependence.