cms-nanoAOD / cmssw

CMS NanoAOD software integration repository
http://cms-sw.github.io/
Apache License 2.0
3 stars 10 forks source link

NanoAOD missing jet muon energy fraction? #169

Closed juska closed 6 years ago

juska commented 6 years ago

[Moving the discussion here from Computing Tools hypernews]

Hi,

the jet muon energy fraction seems to be missing at least in 93X NanoAOD. The existing energy fraction variables are Jet_chEmEF, chHEF, neEmEF and neHEF. I am trying to reconstruct and study jet energy composition.

While the charged EM energy fraction, Jet_chEmEF, could basically also include muon energy, I find by summing over the energy fractions that in a non-vanishing fraction of the jets the energy fraction variables do not sum to unity. That is, muon energy is probably missing. Here's some sums for the two leading jet of a random 93X NanoAOD file:

Fraction sum: 0.998046875 Fraction sum: 0.0314025878906 Fraction sum: 0.998046875 Fraction sum: 1.00006103516 Fraction sum: 0.0256652832031 Fraction sum: 1.0048828125 Fraction sum: 1.001953125 Fraction sum: 1.001953125 Fraction sum: 0.62109375 Fraction sum: 1.00183105469 Fraction sum: 1.0 Fraction sum: 0.998138427734 Fraction sum: 1.00317382812

Please note that I have not applied jetID-cuts (because I did not yet figure out how to use the variable in nAOD), but I think the fractions should sum to unity no matter what.

Another interesting feature is the poor precision of the fraction variables, but this is a matter for another post and I do not know if this is rather a feature than a bug in the tightly compressed NanoAOD format.

Cheers,

Juska

Ps. I think I now figured out how to use the JetID; it's an integer between 0 and 6 and changing that to binary gives the flags for the three defined JetID booleans. Applying the tight cut does not bring all the fraction sums to unity.

arizzi commented 6 years ago

can you clarify your use case? for the muon fractions and for the fractions in general. NanoAod is about choices of what to include, what not and which precision to use. Thw optimization is based on analysis use cases hence we need to know what yiu want to do with those number to understand why the available info is not sufficient (if that's the case)

juska commented 6 years ago

I want to reconstruct the particle-flow jet energy composition: Figure_046-a.pdf

This is a very efficient tool for studying the quality of the whole reconstruction process. As for example the JetID variables depend heavily on the jet energy fractions, it is important to be able to see whether the fraction variables are indeed healthy and produce a similar jet energy composition as studies done with RECO/AOD samples produce. This is not possible if I do not have access to all the fractions that PF divides the jets into. The presence of flavour variables in the NanoAOD also makes it possible to do further studies of jet composition, which is one of my personal agendas for making sure the fraction variables are all there.

I just tried to calculate the jet muon energy fraction by subtracting all the other variables from unity, and the resulting histogram is not what I have seen from AOD-based jet samples.

If the limited numerical precision of the fraction variables is a deliberate choice, I guess I'm fine with it. What worries me a bit tho is that the precision seems to decrease with increasing energy fractions, as you can see from the charged hadron energy fraction plotted with 1000 bins here: chf_decreasing_precision.pdf As you can see, the precision seems to drop first at around 0.127 (1/6), then at 0.25 (1/4) and then at 1/2, and then stays constant until 1.

Please note that I have been out of the loop for most of the time when NanoAOD has made it's appearance to CMS, so I may have ill-founded assumptions and expectations on our great new data format.

arizzi commented 6 years ago

Physics object studies are not in the nanoaod target list. on the other hand I think JME POG is planning to use modified version of nano for their studies and I'm guessing your use case can fit in there. (@rappoccio)

juska commented 6 years ago

So why are the jet energy fraction variable there in the first place then? Seems quite funny that there's some of them but not all. Also, is the decreasing numerical precision an expected feature?

rappoccio commented 6 years ago

You can play with a JMAR workflow I've been developing here. This is in the context of substructure studies on AK8 jets, but in principle you can extend as you wish.

https://github.com/cms-jet/NanoAODJMAR

juska commented 6 years ago

Looks nice thanks! Once my new analysis code is a bit more complete, I'll see if this would be good for me out-of-the-box or if I'd need to extend it to other jet types. Are the ntuples you have produced somewhere available?

arizzi commented 6 years ago

I see that now muonFraction is used for JetID so there could be some use case (...but we precompute the IDs). How inaccurate would it be to simply calculate the fraction from the two matched muon indices? (I'm guessing it should be quite accurate)

gpetruc commented 6 years ago

may also be worth checking the size cost of adding it with low precision (eg precision=4?). it should be mostly zeros for jets with no muons and around 1 for muons, the only real entropy coming from muons in b-jets.

btw, it may be worth checking how the current choice for all fractions (precision=6) compares with storing (fraction*100) as int.

Giovanni

Il Mer 23 Mag 2018, 08:57 arizzi notifications@github.com ha scritto:

I see that now muonFraction is used for JetID so there could be some use case (...but we precompute the IDs). How inaccurate would it be to simply calculate the fraction from the two matched muon indices? (I'm guessing it should be quite accurate)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cms-nanoAOD/cmssw/issues/169#issuecomment-391241042, or mute the thread https://github.com/notifications/unsubscribe-auth/AEbbR_drA6Xx1vC2b9OYd9ME8WcA1es_ks5t1QhVgaJpZM4UFmJR .

juska commented 6 years ago

I tried calculating the muon fraction from the matched muons (using Jet_muonIdx1,2 and Muon_pt[]), and what I see is that the fraction is roughly half of what it is calculated with the other possible method, i.e. subtracting all other jet energy fractions from unity. muon_fraction_muonIdx.pdf muon_fraction_subtraction.pdf

Unfortunately I do not have a similar data sample from AOD at hand so I could check what the muon fraction should really look like in this case. What I can say is that the shape is what I would expect, but the magnitude is e.g. vastly different from my (fairly old) non-CHS jet sample that I happen to have stored locally.

I like Giovanni's idea of storing the fractions as integers. I can help in checking that if you tell me how to learn to do it. As you have noticed, I'm a complete novice when it comes to floating point precision and storage efficiency etc.

arizzi commented 6 years ago

did you compute the fractio dividing by raw pt or corrected pt?

Il gio 24 mag 2018, 10:04 Juska Pekkanen notifications@github.com ha scritto:

I tried calculating the muon fraction from the matched muons (using Jet_muonIdx1,2 and Muon_pt[]), and what I see is that the fraction is roughly half of what it is calculated with the other possible method, i.e. subtracting all other jet energy fractions from unity. muon_fraction_muonIdx.pdf https://github.com/cms-nanoAOD/cmssw/files/2034315/muon_fraction_muonIdx.pdf muon_fraction_subtraction.pdf https://github.com/cms-nanoAOD/cmssw/files/2034318/muon_fraction_subtraction.pdf

Unfortunately I do not have a similar data sample from AOD at hand so I could check what the muon fraction should really look like in this case. What I can say is that the shape is what I would expect, but the magnitude is e.g. vastly different from my (fairly old) non-CHS jet sample that I happen to have stored locally.

I like Giovanni's idea of storing the fractions as integers. I can help in checking that if you tell me how to learn to do it. As you have noticed, I'm a complete novice when it comes to floating point precision and storage efficiency etc.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cms-nanoAOD/cmssw/issues/169#issuecomment-391625908, or mute the thread https://github.com/notifications/unsubscribe-auth/AEyill-akDcuLjIwuAIyM3OmeY8PSEDfks5t1mlMgaJpZM4UFmJR .

juska commented 6 years ago

Oh that's a good point. I did the previous plot with corrected pt. Here's a version with muon fraction calculated from MuonIdx1,2 using raw pt (Jet_pt*(1-Jet_rawFactor). The difference is very small. muon_fraction_muonIdx_raw.pdf

A possible problem with obtaining jet muon energy fraction with the stored muons are unisolated muons - are they saved to the muon collection?

rappoccio commented 6 years ago

Addressed in the referenced PRs. Juska, I think this should satisfy your needs (otherwise we can reopen the issue).