Duplicate Removal with overlapping rinv bins

dally96 commented 2 years ago

Changed DR so that tracks are only compared to each other if they're in the same overlapping rinv bin.

PR description:

PR validation:

if this PR is a backport please specify the original PR and why you need to backport that PR:

Before submitting your pull requests, make sure you followed this checklist:

[ ] verify that the PR is really intended for the chosen branch
[ ] verify that changes follow CMS Naming, Coding, And Style Rules
[ ] verify that the PR passes the basic test procedure suggested in the CMSSW PR instructions

tomalin commented 2 years ago

This fails CI because it fails the code format checks https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/4310037 . You need to run scram b -j8 code-format , and then check that it hasn't introduced any stupid line breaks.

tomalin commented 2 years ago

General comment: Please separate functions in the .cc files by a blank line.

tomalin commented 1 year ago

@trholmes could you check this PR please?

tomalin commented 1 year ago

@dally96 have you addressed all of @trholmes 's comments?

tomalin commented 1 year ago

@dally96 This needs rebasing to resolve the git conflicts mentioned above.

tomalin commented 1 year ago

@dally96 please let us know here if you believe you have addressed all the comments.

dally96 commented 1 year ago

@tomalin @trholmes I believe that I have addressed all comments.

trholmes commented 1 year ago

All the updates on my requests look good.

tomalin commented 1 year ago

Am I correct that with this PR, use of 12 bins in the DR will become the default. And that to try out the original method, people would have to change varRInvBins in Settings.h to specify just a single rinv bin?

tomalin commented 1 year ago

What is the effect on tracking performance at the end of the L1 tracking chain of this PR? e.g. Tracking efficiency, number of tracks, percentage of duplicates. As it will become the default for new MC production for the L1 trigger group, we must be sure it's not too horrible.

dally96 commented 1 year ago

Yes, that is correct. I could add the edges for 1 bin to varRInvBins, making it a vector of vectors again, and then make a variable that would allow others to switch between the two without having to go in and edit the bin edges if you think that would be a better.

dally96 commented 1 year ago

@tomalin What do you mean by number of tracks? Also, how do I calculate tracking efficiency? Is that just the number of tracks that remain after we take into accounts the cuts on eta, dxy, etc?

tomalin commented 1 year ago

In TrackFindingTracklet/test/ , just run L1TrackNtupleMaker_cfg.py; followed by the ROOT macro L1TrackNtuplePlot.cc (e.g. with makeHists.csh), which prints out these numbers.

dally96 commented 1 year ago

Is this all that's needed?

  efficiency for 1.0 < |eta| < 1.75 = 95.8378 +- 0.248434
  efficiency for 1.75 < |eta| < 2.4 = 95.4862 +- 0.346541
  combined efficiency for |eta| < 2.4 = 95.5602 +- 0.138079 = 21265/22253

 efficiency for pt > 2 = 95.5602 +- 0.138079
 efficiency for 2 < pt < 8.0 = 95.6691 +- 0.157433
 efficiency for pt > 8.0 = 95.2312 +- 0.286415
 efficiency for pt > 40.0 = 94.2643 +- 1.16116

 TP/event (pt > 2) = 152.109
 TP/event (pt > 3.0) = 50.629
 TP/event (pt > 10.0) = 4.642
 tracks/event (no pt cut)= 626.768
 tracks/event (pt > 2) = 585.748
 tracks/event (pt > 3.0) = 205.066
 tracks/event (pt > 10.0) = 19.475

tomalin commented 1 year ago

Yes, but we need these numbers with and without your PR, so we know the change in performance it causes. Use at 1k ttbarPU200 events to have enough stats. Also can you give us the percentage of duplicate tracks with and without your PR?

dally96 commented 1 year ago

These results are for 1k events.

Without PR, the percentage of duplicate tracks is 1.042% With my PR, the percentage of duplicate tracks is 1.044%

results_withoutPR.txt results_withPR.txt

tomalin commented 1 year ago

On 28th Oct. you posted to Skype a plot of "Duplicate Fraction vs No. of Bins" for various scenarios. For your default (12 bins & 32 CM), the fraction was about 1%, consistent with your new results. But for "1 bin w/o cut", which I corresponds to what we had before you PR, it showed a duplicate fraction of about 0.7%. So why is this now 1%?

dally96 commented 1 year ago

My mistake. I did not compile the code without the PR before running the events.

Without PR, the percentage of duplicate tracks is 0.703% With my PR, the percentage of duplicate tracks is 1.044%

results_withoutPR.txt results_withPR.txt

These numbers and files should be correct.

tomalin commented 1 year ago

OK, so we're losing 0.2% of tracking efficiency and increasing the duplicate rate by half. As we're still optimising this algo, we should not expose the trigger group to the degraded performance. Can you find a trick, such as doubling the number of CM, which would allow us to temporarily recover the original performance, and so merge this PR, whilst you explore new ideas?

dally96 commented 1 year ago

With 64 CM, the percentage of duplicate tracks reduces to 1.029%.

results_withPR_64CR.txt

tomalin commented 1 year ago

Interesting. The fact that this is so much worse than the current code suggests that something other than truncation is causing the loss of performance. I guess it must be the overlaps being too small. Try making them a bigger.

trholmes commented 1 year ago

Daniel, you have all those nice plots of what changes had which effects on the duplicate rate -- can you post one of them and tell us which constraint caused most of the duplicate rate increase?

dally96 commented 1 year ago

If I make them bigger, we'll have to use fewer number of bins, and that might also be worst performance

At higher number of bins, having more than 32 CM's isn't really a significant factor. Without any of the truncation cuts, bigger overlap gives us a lower duplicate fraction. Increasing overlap size will require us to go to a smaller number of bins, and include more tracks in each bin which, with the 16 CM cut, did worse. However, from the plot of average tracks per bin,

it would've improved if we increased the number of CM's. I think if we ran with a larger overlap size at 6 bins with 64 CM, the duplicate fraction will decrease further but I couldn't confidently say it would be a significant improvement over what we have now.

tomalin commented 1 year ago

The purple line in your plot retains the 0.7% duplicate rate. This uses a 4e-4 overlap and no CM cut (which can perhaps be approximated by using 64 CM?). Your plot shows this works for 1 to 6 bins. So perhaps it would work for 12 bins too, allowing you to commit your preferred choice of 12 bins?

dally96 commented 1 year ago

The problem is that at 12 bins, the outer bin sizes are smaller than twice the 4E-4 overlap. With 6 bins at 4E-4 overlap, and 64 CM, the duplicate fraction becomes 0.741%. Maybe we can use this for now to merge the PR while I investigate using phi bins?

results_1111_6Bins_4E-4Overlap_108TrackCut_64CompCut.txt

tomalin commented 1 year ago

OK

tomalin commented 1 year ago

To be consistent with style of other modules, numTracksPerBin_ should be deleted at https://github.com/cms-L1TK/cmssw/blob/dally_DRoverlapbins/L1Trigger/TrackFindingTracklet/interface/Settings.h#L1049 and instead a new entry {"DR", 108} added to the map at https://github.com/cms-L1TK/cmssw/blob/dally_DRoverlapbins/L1Trigger/TrackFindingTracklet/interface/Settings.h#L888 , later accessed via maxStep("DR").

dally96 commented 1 year ago

Done.

cms-L1TK / cmssw