Open tschuh opened 4 months ago
As you're changing 160 files, the text describing this PR needs to be longer and more detailed. What are the changes that have been implemented? And how do they affect tracking performance? Have you given a talk which describes in detail the changes and the effect on performance. If so, please link it from here.
This fails CI quite spectacularly https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/7858190 . e.g. For HYBRID_NEWKF, the tracking efficiency is up by 0.3% (good), but the number of reconstructed tracks is down by 80% (very weird, given that no efficiency is lost!), and the z0 resolution is a factor 5 worse (bad).
For comparison, this is the git CI for a dummy PR that changed no code https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/7924082 .
This fails CI quite spectacularly https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/7858190 . e.g. For HYBRID_NEWKF, the tracking efficiency is up by 0.3% (good), but the number of reconstructed tracks is down by 80% (very weird, given that no efficiency is lost!), and the z0 resolution is a factor 5 worse (bad).
For comparison, this is the git CI for a dummy PR that changed no code https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/7924082 .
==== Thomas says CI failure explained by poor z0 resolution. He's retuning the digitisation ranges to fix this. But he believes that the tracklet helix parameters are wrong (inconsistent with seed positions by ~1cm) for a small fraction of the tracklets, which is also contributing to poor z0 resolution.
new KF maths are fixed now, also passing CI using new KF. However, CI using old KF fails due to slightly lower z0 resolution. My PR should not impact those...
I understand that the ProducerTM usually orders tracks according to their seed type. This is important for the functioning of the DR, so should be described in the comment at the top of the class header file, which explains what the class does. And at the line of the code that causes this ordering to happen, an additional comment should appear to highlight it. A comment mentioning this inside ProducerDR would also be wise.
I understand that the ProducerTM usually orders tracks according to their seed type. This is important for the functioning of the DR, so should be described in the comment at the top of the class header file, which explains what the class does. And at the line of the code that causes this ordering to happen, an additional comment should appear to highlight it. A comment mentioning this inside ProducerDR would also be wise.
done
@Chriisbrown & @cgsavard -- Thomas's overview description of this PR says "L1TrackQuality renamed in TrackQuality and turned into ESProduct". Are you both and Andreas happy with this change?
At the top of the header of the KF class where you do the maths (KalmanFilter.h?), add the comments copied from L7-31 of https://github.com/cms-L1TK/cmssw/blob/tschuh/L1Trigger/TrackFindingTMTT/interface/KFParamsComb.h#L7 , so the maths is documented. Although as you treat r-phi and r-z planes independently, you may need to modify a few lines of this.
Overview description of this PR should mention that new class TrackFindingTracklet/ProducerKF has been added.
All your code uses a few classes a lot, such as TTBV, Frame, FrameStub & FrameTrack, StreamStub & StreamTrack, DataFormats & DataFormat, Setup. But they're only documented in a few sentences in /L1Trigger/TrackerTFP/README.md . I suggest adding a dedicated section to this README, explaining these things and giving examples of common manipulations, such as extracting a float from the digi data.
Please improve the comments at the top of the various .h and _cfg.py files with "Demonstrator" in the title, so that anyone wishing to use this code to check SW vs FW would understand what the various modules and doing and how they need to set up Vivado etc. to work with them. e.g. TrackerTFP/test/AnalyzerDemonstrator.cc & TrackerTFP/plugins/ProducerDemonstrator.cc both have the same very short comment, despite the fact that one is an ESProducer and the other is an EDAnalyzer. Please make clear what the difference between them is. e.g. What is the ESProducer for? What is the EDAnalyzer doing?
Expand existing comment at start of LayerEncoding.h, to explain what the encoding actually is. i.e. How are the layers encoded?
Some points in PR description are too short to understand. e.g. "code to create clean TP and association maps added". Please explain what the purpose of this is. And provide evidence (e.g. link to talk) that it is useful. I guess it's goal is to save CPU?
If it's true that this PR contains a 5-param KF fit, please so say in the description, and give link to a talk where you explain how to use it. e.g. Just one just need to set ParameterFit5 = True in Setup_cfi.py?
I added the comments I could in the parts that I am familiar with. Unfortunately there are a lot of files I am unfamiliar with and so I couldn't be of much help there. I am not opposed to changing tracker quality to its own EDProducer if you feel that it is a better setup. Although right now the HYBRID version still uses it as a function and not EDProducer. Can we make that consistent?
There are three main concerns I brought up:
1. The change in naming scheme from chi2rz and chi2rphi to chi20 and chi21 only adds confusion to any others who look in. I think it's better to change these to the naming scheme that is generally accepted. 2. The MVA1 variable is not being calculated correctly as many of the BDT inputs are wrong. The correct variables that should be input can be seen further down in the TrackQuality.cc file and I made notes on changes need to each variable. 3. The tttracks being created in TrackFindingProcessor.cc have incorrect chi2 and mva variables. The chi2rz and chi2rphi should not be pdof and the mva should be in the 0-1 range and not binned upon tttrack initialization.
Thanks for your review. I also spotted that there are two mva calculations, one used in the emulator one in the simulator. And those are different... As soon as the new KF replaces the old one the simulation function call will become obsolete and can be removed. to 1) sorry my bad no problem to stick whatever convention you have. to 2. I do not now which emulation reflects the f/w, therefore I do not want to change the behavior. Best person to debug this is @Chriisbrown. to 3. my bad again, I fixed it.
I made a forced push squashing all previous commits to make it less painful when I have to rebase in future. I also used the opportunity to remove files which shouldn't be in the PR to begin with.
@tomalin the scram b code-format on the ci machine seems to be different as on my machine. I can't fix the CI failure. Could you please have a look?
@tomalin another I realized is that ci complains about L1Trigger/TrackFindingTracklet/interface/Settings.h failing format, however L1Trigger/TrackFindingTracklet/interface/Settings.h gets not touched in this PR ...
I am seeing some pretty big differences in the tracks being produced before and after this big restructuring. For example, the tracks have less nstub, as we can see in this plot where I test 100 evts from ttbar 200 PU: And very weirdly, the tracks seem to have a pT cutoff at 15 GeV as seen here: 56e22-5331-4e8b-9833-91d9c8825ee5)
Other differences involve much smaller chi2 variables and mostly all tracks coming from seed 0 after these changes. Is this all understood?
For the pt I found a bug (factor 4) in the TTTrack conversion. The seed 0 dominance originates most likely from the DR, tending to keep lower seeds when duplicates are found. The nStub thing likely originates from the move to a single KF worker. You might want to double check by setting this to False. For the chi2s I must confess that I might have changed the behavior. I did not fully understand the original code and replaced it with something which emulates what I would write in f/w. I will look deeper into that.
The Chi2s between the old calculation and floating point look similar enough, i don't think that the original code was buggy. Do you look at chi2 or chi2 / ndf?
For the pt I found a bug (factor 4) in the TTTrack conversion.
This indeed looks like it's fixed.
The seed 0 dominance originates most likely from the DR, tending to keep lower seeds when duplicates are found.
Does this mean the DR behavior has been changed? Because I am seeing that before this PR the seed distribution looks very different.
The nStub thing likely originates from the move to a single KF worker. You might want to double check by setting this to False.
I do see that the nstub distribution looks more similar before and after the PR when setting this variable to false. Based on the variable name and description though, it sounds like this is disabling truncation and not changing the number of KF workers. Is this the same thing? And was this truncation not enabled before and that's why we see differences?
The Chi2s between the old calculation and floating point look similar enough, i don't think that the original code was buggy. Do you look at chi2 or chi2 / ndf?
Here are the distributions of chi2 and chi2/dof before and after this PR. We can see that it has significantly changed so something does still seem to be off.
Does this mean the DR behavior has been changed?
Yes.
I do see that the nstub distribution looks more similar before and after the PR when setting this variable to false. Based on the variable name and description though, it sounds like this is disabling truncation and not changing the number of KF workers. Is this the same thing? And was this truncation not enabled before and that's why we see differences?
less KF workers means more truncation, therefore is disabling truncation a good test to simulate a many worker scenario
Here are the distributions of chi2 and chi2/dof before and after this PR. We can see that it has significantly changed so something does still seem to be off.
Well the KF got debugged. The old code got sometimes the parameter very wrong leading to bad resolutions and high chi2s. I guess we just see now the debugged chi2 plots. A good cross check would be to compare against oldKF.
Well the KF got debugged. The old code got sometimes the parameter very wrong leading to bad resolutions and high chi2s. I guess we just see now the debugged chi2 plots. A good cross check would be to compare against oldKF.
Interestingly, I see different behaviors comparing chi2rz and chi2rphi with the old and new kf before these changes were implemented. For chi2rz/dof, I see that the current distribution looks like the old KF before changes. For chi2rphi/dof, I see that the current distribution looks like the new KF before changes. If we assume the oldKF distributions are correct, then it seems the issue might be in the chi2rphi calculation specifically. Here are both distributions below:
The nStub thing likely originates from the move to a single KF worker. You might want to double check by setting this to False.
Is the fact that we now have mostly 4 stub tracks acceptable? Don't we want to be producing more 5-6 stub tracks as they tend to be better tracks? I understand you said this is probably because there is only 1 KF worker, but does this indicate that we should be using more than one worker to get better tracks?
If we assume the oldKF distributions are correct, then it seems the issue might be in the chi2rphi calculation specifically
Well the newKF did show improved r-phi parameter resolution. So it is not unexpected to see lower chi2rphi. The difference is quite big though. I wonder if the oldKF is scaling the chi2rphi up, I think it does that when calculating overall chi2. @tomalin do you know what the oldKF stores in the TTTracks?
Is the fact that we now have mostly 4 stub tracks acceptable? Don't we want to be producing more 5-6 stub tracks as they tend to be better tracks? I understand you said this is probably because there is only 1 KF worker, but does this indicate that we should be using more than one worker to get better tracks?
That is a thing wee need to decide. I am not sure if more worker is sensible as long as we are not able to build f/w. It would be good to understand what we are actually gaining by the additional stubs.
git CI reports that the HYBRID_NEWKF performance is significantly better than it used to be. If you've given a talk on this, please link it from the description of this PR. If not, please give such as talk. (However, at eta = 0.05, git CI says HYBRID has z0 resolution of 0.10 cm, whereas HYBRID_NEWKF has one of 0.14cm, so performance should still be improved further, even if not required for this PR).
I understand that the ProducerTM usually orders tracks according to their seed type. This is important for the functioning of the DR, so should be described in the comment at the top of the class header file, which explains what the class does. And at the line of the code that causes this ordering to happen, an additional comment should appear to highlight it. A comment mentioning this inside ProducerDR would also be wise.
done
The comment at the top of ProducerTM "Transforms format of TBout into that expected by DR input and muxes all channels to 1. Since DR keeps first tracks the mux ordering is important." still gives no info about what the ordering actually is.
All your code uses a few classes a lot, such as TTBV, Frame, FrameStub & FrameTrack, StreamStub & StreamTrack, DataFormats & DataFormat, Setup. But they're only documented in a few sentences in /L1Trigger/TrackerTFP/README.md . I suggest adding a dedicated section to this README, explaining these things and giving examples of common manipulations, such as extracting a float from the digi data.
Any progress on this comment?
Please improve the comments at the top of the various .h and _cfg.py files with "Demonstrator" in the title, so that anyone wishing to use this code to check SW vs FW would understand what the various modules and doing and how they need to set up Vivado etc. to work with them. e.g. TrackerTFP/test/AnalyzerDemonstrator.cc & TrackerTFP/plugins/ProducerDemonstrator.cc both have the same very short comment, despite the fact that one is an ESProducer and the other is an EDAnalyzer. Please make clear what the difference between them is. e.g. What is the ESProducer for? What is the EDAnalyzer doing?
Any progress on this?
Expand existing comment at start of LayerEncoding.h, to explain what the encoding actually is. i.e. How are the layers encoded?
Any progress on this?
Hi @tschuh & @cgsavard . I've checked the status of the comments I made on this PR. Those that are still to do, I've either added a "Any progress on this?" comment, which causes them to appear again at the end of this PR conversation or I've marked them with a "thumbs down" icon. Those which are resolved, but where this web page provided no "resolved" button, I've marked with a "thumbs up" icon. Those which are not resolved, but where I caused a copy of the comment to appear later on this page by adding "Any progress on this?", I've marked with a "rocket" icon. All my comments are trivial styles ones (plus one request to give a talk on performance), so Thomas should be able to answer these easily. Claire's comments look more substantial, so Thomas should please post a message here when he believes that he's addressed all of her comments.
I did a couple of more studies looking at the old KF vs the new with these changes and I've got a few more concerns.
It looks like a lot more high eta duplicate tracks are being produced which is a consequence of the changes in this PR. In general, the duplicate rate for the new KF is higher than the old KF but this PR does help reduce the duplicates in low eta. Do es a fix need to be made for the high eta duplicates or is this understood? Plots that show this:
I think the increase in z0 resolution issue has been known and I do see that this PR helps fix the resolution a bit, but it's still quite a bit higher from the old KF. If I look at the difference between the tp and match trk z0, I actually see a shift off of 0 for the new KF, both before and after this PR. This is most obvious in the tracking particles with eta > 0. Is there a higher order correction to z0 or another way to fix this? It does seem like the track z0 is being poorly reconstructed.
git CI reports that the HYBRID_NEWKF performance is significantly better than it used to be. If you've given a talk on this, please link it from the description of this PR. If not, please give such as talk. (However, at eta = 0.05, git CI says HYBRID has z0 resolution of 0.10 cm, whereas HYBRID_NEWKF has one of 0.14cm, so performance should still be improved further, even if not required for this PR).
done
All your code uses a few classes a lot, such as TTBV, Frame, FrameStub & FrameTrack, StreamStub & StreamTrack, DataFormats & DataFormat, Setup. But they're only documented in a few sentences in /L1Trigger/TrackerTFP/README.md . I suggest adding a dedicated section to this README, explaining these things and giving examples of common manipulations, such as extracting a float from the digi data.
Any progress on this comment?
done
Please improve the comments at the top of the various .h and _cfg.py files with "Demonstrator" in the title, so that anyone wishing to use this code to check SW vs FW would understand what the various modules and doing and how they need to set up Vivado etc. to work with them. e.g. TrackerTFP/test/AnalyzerDemonstrator.cc & TrackerTFP/plugins/ProducerDemonstrator.cc both have the same very short comment, despite the fact that one is an ESProducer and the other is an EDAnalyzer. Please make clear what the difference between them is. e.g. What is the ESProducer for? What is the EDAnalyzer doing?
Any progress on this?
done
Expand existing comment at start of LayerEncoding.h, to explain what the encoding actually is. i.e. How are the layers encoded?
Any progress on this?
done
I did a couple of more studies looking at the old KF vs the new with these changes and I've got a few more concerns.
Many thanks for having a deeper look. Both points (high eta duplicates and resolution bias) are likely worth an issue. Would you mind creating them? For point 1) I am wondering if one can solves this simply by increasing the number of comparison modules here For the bias I am wondering if we see that in all 4 track parameter or only in z0.
Claire's comments look more substantial, so Thomas should please post a message here when he believes that he's addressed all of her comments.
I think I answered them all. We discovered are a lot of bugs in Chris's TQ emulation code and I am not sure if we want to fix them in this PR.
git CI reports that the HYBRID_NEWKF performance is significantly better than it used to be. If you've given a talk on this, please link it from the description of this PR. If not, please give such as talk. (However, at eta = 0.05, git CI says HYBRID has z0 resolution of 0.10 cm, whereas HYBRID_NEWKF has one of 0.14cm, so performance should still be improved further, even if not required for this PR).
done
I understand that the ProducerTM usually orders tracks according to their seed type. This is important for the functioning of the DR, so should be described in the comment at the top of the class header file, which explains what the class does. And at the line of the code that causes this ordering to happen, an additional comment should appear to highlight it. A comment mentioning this inside ProducerDR would also be wise.
done
The comment at the top of ProducerTM "Transforms format of TBout into that expected by DR input and muxes all channels to 1. Since DR keeps first tracks the mux ordering is important." still gives no info about what the ordering actually is.
done
Many thanks for having a deeper look. Both points (high eta duplicates and resolution bias) are likely worth an issue. Would you mind creating them?
If they are not addressed in this PR then we can create the issues, but I think the high eta duplicates should be addressed here since it is only showing up as a result of the changes made in this PR.
For the bias I am wondering if we see that in all 4 track parameter or only in z0.
I don't see any obvious shift in phi and pt but I do see a shift in eta as well where the tttrack tends to have a larger eta than the tracking particle it is matched to. Both the eta and z0 shifts are present before and after this PR in the newKF .
I am wondering if one can solves this simply by increasing the number of comparison modules
The increase in duplicate high eta tracks does not go away when doubling the number of DR comparison modules to 64, which I just tried.
Your code uses MessageLogger via edm::LogPrint("L1Trigger/TrackerTFP"). I believe the category has to be a simple string, so a forward slash is not allowed. e.g. https://gitlab.cern.ch/ejclemen/cmssw_trackfinding_hlsframework/-/blob/master/TrackFindingTrackletHLS/test/L1tracking_cfg.py?ref_type=heads#L34 gives an example of how the level of printout from the "IRProducer" category is controlled. You can see that the category corresponds to the python variable name, and python variable names cant include forward slash. (Also note that whenever the MessageLogger prints the category name, it automatically also prints the name of the EDProducer that called it). Also LogPrint is intended for "Warnings" https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideMessageLogger , so should only be used when something has gone wrong.
This PR updates mainly the modules TrackerTFP, TrackFindingTracklet and TrackTrigger. It brings the Kalman Filter, Track Multiplexer, Duplicate Removal and Track Quality steps up to date (https://indico.cern.ch/event/1449862/contributions/6116147/attachments/2924526/5133676/thomas.pdf)
Detailed list for Ian: