cms-L1TK / cmssw

Fork of CMSSW where improvements to L1 tracking code are developed.
http://cms-sw.github.io/
Apache License 2.0
4 stars 5 forks source link

Ntuplemaker crashes with New KF #218

Closed Chriisbrown closed 1 year ago

Chriisbrown commented 1 year ago

When running the L1TrackNtupleMaker with the algo set to "HYBRID_NEWKF" over the recent Phase2Fall22DRMiniAOD large TTbar sample the ntuple maker occasionally throws an error on events in the sample, this happens frequently across the sample but here is a specific example taken from store/mc/Phase2Fall22DRMiniAOD/TT_TuneCP5_14TeV-powheg-pythia8/GEN-SIM-DIGI-RAW-MINIAOD/PU200_PUTP_125X_mcRun4_realistic_v2-v1/2540000/609bdf8d-43a5-435f-8726-105cf1d97243.root

Event number 2, track number 55

image

This issue happens before line 1087 of the track ntuple maker as when running with the debug as none of the track debug is being outputted from this track

I can't recreate this issue when setting the algo to "HYBRID" so is specifically in the NEWKF

This is using the current L1TK-dev-12_6_0_pre5 branch

tschuh commented 1 year ago

Without knowing which line in L1TrackNtupleMaker throws the error I can't help. Who wrote that Analyzer?

Chriisbrown commented 1 year ago

It is this line: https://github.com/cms-L1TK/cmssw/blob/fd9a5da928b0c633034f0ebba44df76eb09984dc/L1Trigger/TrackFindingTracklet/test/L1TrackNtupleMaker.cc#L918

I don't know how the hitpattern helper is different between the Old/New KFs but it is something in there

tschuh commented 1 year ago

I think that was written by @Jingyan95. Can you help Jack?

Jingyan95 commented 1 year ago

I think that was written by @Jingyan95. Can you help Jack?

Hello @tschuh , I saw that new KF is outputting -122.402 as the z0 (with an eta of 0.307698) for the track that is causing the crash, which sounds a bit strange. Is this expected ?

tschuh commented 1 year ago

Well the KF has no mechanism of killing tracks, if that is the result it will send it. The Track Quality should mark this track as garbage and downstream consumer should not take it into account.

tomalin commented 1 year ago

I ran CMSSW with the gdb debugger, which crashes with traceback below, so I guess in the original message reported by @Chriisbrown of "_n (which is 38)", it was the value of binCot which was 38, and so outside the expected range.

It's important to realize that when we run with HYBRID_NEWKF, the TTTrack collection is produced by the Tracklet pattern recognition. The new KF currently does not write a TTTrack collection (though Thomas tells me we should add this sometime). It only writes an EDProduct of its digitised output stream. Therefore it's plausible that the tracks lie outside the expected eta sectors etc., and this causes the crash. We should probably disable the running of L1TrackNtupleMaker.cc when using HYBRID_NEWKF to avoid confusion?

  1. trackerTFP::LayerEncoding::layerEncoding (binCot=38, binZT=, binEta=, this=0x7fffa1ffe4c0) at /net/home/ppd/tomalin/UK_TrackTrig/hybrid12/CMSSW_12_6_0_pre5/src/L1Trigger/TrackerTFP/interface/LayerEncoding.h:26
  2. hph::Setup::layerEncoding (binCot=38, binZT=, binEta=, this=0x7fffa1ffcc00) at /net/home/ppd/tomalin/UK_TrackTrig/hybrid12/CMSSW_12_6_0_pre5/src/L1Trigger/TrackFindingTracklet/interface/HitPatternHelper.h:63
  3. hph::HitPatternHelper::HitPatternHelper (this=0x7fffffff1390, setup=0x7fffa1ffcc00, hitpattern=51, cot=0.3125762939453125, z0=-122.40198516845703) at /net/home/ppd/tomalin/UK_TrackTrig/hybrid12/CMSSW_12_6_0_pre5/src/L1Trigger/TrackFindingTracklet/src/HitPatternHelper.cc:81
  4. 0x00007fffa4587b83 in L1TrackNtupleMaker::analyze (this=, iEvent=..., iSetup=...) at /net/home/ppd/tomalin/UK_TrackTrig/hybrid12/CMSSW_12_6_0_pre5/src/L1Trigger/TrackFindingTracklet/test/L1TrackNtupleMaker.cc:942
Chriisbrown commented 1 year ago

@tomalin This makes sense as the source of the issue. If we can't run the L1TrackNtupleMaker with the HYBRID_NEWKF how should standard track plots be made so the performance can be compared against the previous HYBRID approach as currently the workflow is to create the ntuples with the L1TrackNtupleMaker and then run them through the L1TrackNtuplePlot root macro?

Jingyan95 commented 1 year ago

Hello @tomalin and @Chriisbrown ,

Thank you for checking. I am working on a fix to avoid out of bound error like this. I will make a PR soon.

tomalin commented 1 year ago

Hello @tomalin and @Chriisbrown ,

Thank you for checking. I am working on a fix to avoid out of bound error like this. I will make a PR soon.

Thanks @Jingyan95 . Even when using the old HYBRID, people do sometimes choose to disable the DR & KF, so that the TTTrack collection is produced only using Tracklet pattern reco. This is useful to understand how DR & KF affect performance. I guess the same crash can occur then, so it would be good if HitPatternHelper could avoid crashing, even if it just disables itself in that cfg. (I'm not sure it's worth the complexity of adding lots of new code).

tomalin commented 1 year ago

@tomalin This makes sense as the source of the issue. If we can't run the L1TrackNtupleMaker with the HYBRID_NEWKF how should standard track plots be made so the performance can be compared against the previous HYBRID approach as currently the workflow is to create the ntuples with the L1TrackNtupleMaker and then run them through the L1TrackNtuplePlot root macro?

@Chriisbrown we can already compare L1TrackNtuplePlot results for old HYBRID with results from TrackFindingTracklet/test/AnalyzerKFout.cc for HYBRID_NEWKF, but you're right that it would be better to use L1TrackNtuplePlot for everything.

To achieve that, we'd have to modify TrackFindingTracklet/plugins.ProducerKFout.cc so it wrote a TTTrack collection. @tschuh could say if this would be easy enough that someone could do it under his guidance. Function "consume" https://github.com/cms-L1TK/cmssw/blob/L1TK-dev-12_6_0_pre5/L1Trigger/TrackerTFP/test/AnalyzerKF.cc#L234 illustrates how to convert the streams of stub & track digi data from the KF into TTTrack objects. (For some reason, TrackFindingTracklet/test/AnalyzerKFout.cc doesn't do this). I guess we'd have to add similar code to ProducerKFout. We could consider that after your KFout PR is merged.

tomalin commented 1 year ago

My original understanding was incorrect. @tschuh tells me that his New KF does actually write a TTTrack collection, which is written by the EDProducer named ProducerTT. It runs on the output of the KF rather than the KFout (which we should fix sometime).

tomalin commented 1 year ago

I've discovered that the track which causes the crash has a crazy value of z0 stored in the TTTrack. If one prints tmp_trk_z0 at https://github.com/cms-L1TK/cmssw/blob/L1TK-dev-12_6_0_pre5/L1Trigger/TrackFindingTracklet/test/L1TrackNtupleMaker.cc#L937 , it has a value of -122cm. The tanL = +0.33 is sensible, but because of the crazy z0, HitPatternHelper::Setup::etaRegion() assigns it to a KF eta sector in the forward region of the Tracker, which makes a very big angle to the track, causing the digitisation of HitPatternHelper::cotBin_ to overflow.

@tschuh says the Hybrid KF applies no cuts at all to the output fitted tracks, so they can be crazy. (Though he thinks this track implies a KF bug). He can see the crash on the RAL linux machine in event 2 of /opt/ppd/data/cms/L1TrkMC/609bdf8d-43a5-435f-8726-105cf1d97243.root .

Jingyan95 commented 1 year ago

Well the KF has no mechanism of killing tracks, if that is the result it will send it. The Track Quality should mark this track as garbage and downstream consumer should not take it into account.

@tomalin I think according to Thomas' comment above, new KF keeps every track regardless of the track quality.

tomalin commented 1 year ago

@Jingyan95 I think I've spotted a bug. HitPatternHelper::useNewKF_ is "false", despite L1TrackNtupleMaker_cfg.py being run with HYBRID_NEWKF. Therefore the HPH thinks the NewKF is not being used. If you agree, I'll make PR to set process.HitPatternHelperSetup.useNewKF = True in L1TrackNtupleMaker_cfg.py if NewKF is being used.

tomalin commented 1 year ago

So fixing above bug via PR https://github.com/cms-L1TK/cmssw/pull/223, and rejecting tracks with abs(z0) > 30cm in L1TrackNtupleMaker.cc stops crash. In ttbar+200PU, I observe that 0.2% of tracks have abs(z0) > 30cm. These tracks all have chi2/dof > 100 and only 4 stubs. Their Pt & eta are normal.

tomalin commented 1 year ago

I'm closing this, but I think cuts to reject crazy tracks really should be applied at the output of the KF.