cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.08k stars 4.31k forks source link

Empty grid for DeepTauId #44501

Open jfernan2 opened 7 months ago

jfernan2 commented 7 months ago

While implementing basic guards to solve the empty input problem in DeepTauId there were still empty grids which need to be investigated with Tau experts.

https://github.com/cms-sw/cmssw/issues/44333#issuecomment-2009137153_

jfernan2 commented 7 months ago

assign reconstruction

cmsbuild commented 7 months ago

New categories assigned: reconstruction

@jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 7 months ago

cms-bot internal usage

cmsbuild commented 7 months ago

A new Issue was created by @jfernan2.

@makortel, @antoniovilela, @rappoccio, @Dr15Jones, @smuzaffar, @sextonkennedy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

jfernan2 commented 7 months ago

type tau

jfernan2 commented 7 months ago

@cms-sw/tau-pog-l2 FYI

jfernan2 commented 7 months ago

@valsdav could you please post here instructions to reproduce the issue in case they had changed from original issue ? Thank you

valsdav commented 7 months ago

Hi! Here is the reproducer to get the event with the empty grid. From https://github.com/cms-sw/cmssw/issues/44333#issuecomment-1989632992

You will see empty grids in the input preparation. If you merge on top my PR #44455 the inference will be skipped on the empty inputs.

jfernan2 commented 5 months ago

ping @cms-sw/tau-pog-l2

jfernan2 commented 5 months ago

ping @cms-sw/tau-pog-l2 @mbluj @Ksavva1021

mbluj commented 5 months ago

@kandrosov @brallmond could you also take a look, please?

kandrosov commented 5 months ago

@mbluj I have identified the tau that has the empty signal grid in the example event pointed out above:

tau pt=52.9444 eta=-0.0898154 phi=-0.68437
innerSignalConeRadius = 0.0566632
signalTauChargedHadronCandidates
  pt = 32.8809, eta = -0.177717, phi = -0.669344, mass = 0.13957, deltaR = 0.0892
    charge = 0 (pdgId = 130)
    charged PFCandidate (3:2826:1039): pt = 32.8809, eta = -0.177717, phi = -0.669344 (pdgId = 130)
    reco::Track: N/A
    neutral PFCandidates: N/A
    position@ECAL entrance: x = 163.066, y = -129.112, z = -44.1041 (eta = -0.21049, phi = -0.669706)
    algo = PFNeutralHadron
signalPiZeroCandidates
  pt = 11.7301, eta = 0.0901726, phi = -0.745689
    daughter #0 (PFGamma): pt = 3.92827, eta = 0.0625999, phi = -0.757536, deltaR = 0.169
    daughter #1 (PFGamma): pt = 2.98864, eta = 0.0810623, phi = -0.714655, deltaR = 0.174
    daughter #2 (PFGamma): pt = 3.44201, eta = 0.116514, phi = -0.774948, deltaR = 0.225
    daughter #3 (PFGamma): pt = 1.37547, eta = 0.122283, phi = -0.70606, deltaR = 0.213
  pt = 7.9762, eta = 0.0110623, phi = -0.656186
    daughter #0 (PFGamma): pt = 7.9762, eta = 0.0110623, phi = -0.656186, deltaR = 0.105

Indeed all signal PF candidates have deltaR > innerSignalConeRadius, where innerSignalConeRadius is computed with getInnerSignalConeRadius. This happens for HLT path HLT_VBF_DiPFJet45_Mjj550_MediumDeepTauPFTauHPS45_L2NN_eta2p1_v2, which uses as an input hltHpsPFTauProducer -> hltHpsPFTauProducerSansRefs -> hltHpsCombinatoricRecoTaus. In hltHpsCombinatoricRecoTaus signalConeSize = cms.string('max(min(0.1, 3.0/pt()), 0.05)'). So I don't understand how it is possible.

mbluj commented 5 months ago

@kandrosov thank you for checking. It is indeed very weird tau candidate with one charged hadron on top of neutral one (charged w/o track recovery) and two pi0 candidates (strips) - such candidates are not allowed offline. To be checked if it is allowed online. It is in addition to the fact that tau signal candidates are not in the signal cone of dR~0.0566 as you found.

kandrosov commented 5 months ago

@mbluj, thank you for the confirmation. Tau HPS DM finding selection is not applied online because it would require dedicated tuning. So tau with the worst quality is expected to arrive at the DeepTau step. I was just not expecting that it could be that bad. Given that we are moving to PNet-based tau tagging for online, I think it is not necessary to invest time in re-optimizing HPS for online. On the other hand, the fix on the code side was implemented some time ago, so DeepTau doesn't crash when processing taus with an empty signal cone. We also understood that the origin of the behavior is not related to any bugs but rather to a very loose HPS selection criterion at the tau building step. I think the GitHub issue can be closed.

mbluj commented 5 months ago

@mbluj, thank you for the confirmation. Tau HPS DM finding selection is not applied online because it would require dedicated tuning. So tau with the worst quality is expected to arrive at the DeepTau step. I was just not expecting that it could be that bad. Given that we are moving to PNet-based tau tagging for online, I think it is not necessary to invest time in re-optimizing HPS for online. On the other hand, the fix on the code side was implemented some time ago, so DeepTau doesn't crash when processing taus with an empty signal cone. We also understood that the origin of the behavior is not related to any bugs but rather to a very loose HPS selection criterion at the tau building step. I think the GitHub issue can be closed.

I agree that the issue can be closed. For records: at building step all possible decay modes are built and then the "best one" is selected for further processing. It could be that the "best one" is not passing HPS requirements as in this particular case, but it is strange that all tau products (signal candidates) are outside signal cone - it could be a "feature" built in the algorithm which was not expected to be used without decay mode discriminator, but it should be understood.

jfernan2 commented 5 months ago

+1 Closing since online is going to use PNet/based tau tagging

cmsbuild commented 5 months ago

This issue is fully signed and ready to be closed.

mmusich commented 5 months ago

Given that we are moving to PNet-based tau tagging for online, I think it is not necessary to invest time in re-optimizing HPS for online.

checking in the very latest development HLT table (/dev/CMSSW_14_0_0/HLT/V151) I see:

$ ./getListOfPathsContaining.py hltHpsPFTauDeepTauProducerForVBFIsoTau /dev/CMSSW_14_0_0/HLT/V151 
--------------------
Paths
--------------------
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS20_eta2p1_SingleL1_v9
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS45_L2NN_eta2p1_CrossL1_v9
HLT_VBF_DiPFJet45_Mjj650_MediumDeepTauPFTauHPS45_L2NN_eta2p1_v3
HLT_VBF_DiPFJet45_Mjj750_MediumDeepTauPFTauHPS45_L2NN_eta2p1_v3
HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v10
$ ./getListOfPathsContaining.py hltHpsPFTauDeepTauProducer /dev/CMSSW_14_0_0/HLT/V151
--------------------
Paths
--------------------
HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_OneProng_v5
HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_v9
HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet75_v9
HLT_DoubleMediumDeepTauPFTauHPS35_L2NN_eta2p1_v9
HLT_Ele24_eta2p1_WPTight_Gsf_LooseDeepTauPFTauHPS30_eta2p1_CrossL1_v10
HLT_IsoMu20_eta2p1_LooseDeepTauPFTauHPS27_eta2p1_CrossL1_v10
HLT_IsoMu24_eta2p1_LooseDeepTauPFTauHPS180_eta2p1_v10
HLT_IsoMu24_eta2p1_LooseDeepTauPFTauHPS30_eta2p1_CrossL1_v10
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS30_L2NN_eta2p1_CrossL1_v9
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS30_L2NN_eta2p1_OneProng_CrossL1_v5
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_CrossL1_v9
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet75_CrossL1_v9
HLT_IsoMu24_eta2p1_MediumDeepTauPFTauHPS35_L2NN_eta2p1_CrossL1_v10
HLT_LooseDeepTauPFTauHPS180_L2NN_eta2p1_v10

what I am missing?

brallmond commented 5 months ago

Hi Marco, Originally we expected the DeepTau paths to be prescaled to zero shortly after the first 2024 data. Following the first STEAM meeting using 2024C data, the decision was to collect more data and continue studies. These will be presented at the STEAM meeting on Friday this week. From the Tau Trigger side, we have asked for feedback regarding DeepTau and PNet comparisons and no one has raised any issues. What we see in our monitoring paths are the same or better performance from the PNet paths. So, my understanding is that TSG/STEAM will make a decision this Friday regarding the prescaling of the DeepTau paths.

mmusich commented 5 months ago

So, my understanding is that TSG/STEAM will make a decision this Friday regarding the prescaling of the DeepTau paths.

what about MC? Some of these triggers took data (and we normally include those in the frozen table for MC).

mmusich commented 5 months ago

@jfernan2 please re-open this issue that was closed too hastily. I think something still needs to be done for the HLT step of 2024 MC (or it is clarified it's not needed)

jfernan2 commented 5 months ago

assign hlt

cmsbuild commented 5 months ago

New categories assigned: hlt

@Martin-Grunewald,@mmusich you have been requested to review this Pull request/Issue and eventually sign? Thanks

jfernan2 commented 5 months ago

Done

jfernan2 commented 5 months ago

It seems reopening command is not working, let's iterate here anyways

brallmond commented 5 months ago

@jfernan2 please re-open this issue that was closed too hastily. I think something still needs to be done for the HLT step of 2024 MC (or it is clarified it's not needed)

Sorry, but what needs to be done still? This issue was opened to understand why crashes were occurring online related to this module, and now we have an explanation thanks to Konstantin and Michal. The handling of the corner cases which caused crashes is already taken care of by guards in the parent issue. As far as I understand, it should be no issue for the DeepTau paths to be prescaled in the online menu and remain in the frozen HLT table for 2024 MC since the original issue is handled and understood, and because the DeepTau paths have taken some data now in 2024.

makortel commented 5 months ago

@cmsbuild, please reopen

mmusich commented 5 months ago

Sorry, but what needs to be done still?

From the mother issue:

Basic guards to solve the empty input problem in DeepTauId are in place, but the reason of the empty grid needs to be investigated with Tau experts.

and from the protection PR:

This PR avoids calling the TF inference if empty inputs are detected. The grid should never be empty. The issue will be followed up with Tau experts (enter here issue number..)

from the discussion above it's not clear to me (sorry I am not a tau expert) if the workaround in place is discarding events that are potentially useful or not.

mbluj commented 5 months ago

The "workaround" discards weird, ill-defined, taus which anyway should be filtered out. At offline such taus are killed by basic filter, so-called new decay mode finder, which was omitted at HLT to boost efficiency (due to difference in online reco it would require special tuning) . What still needs be understood is why such ill-defined taus are produced (if it is expected or non expected feature of HPS reconstruction), but I think it is beyond scope of this issue.

mmusich commented 5 months ago

New episode of the saga at https://github.com/cms-sw/cmssw/issues/45136