cms-L1TK / cmssw

Fork of CMSSW where improvements to L1 tracking code are developed.
http://cms-sw.github.io/
Apache License 2.0
4 stars 5 forks source link

Track Quality Emulation #209

Closed Chriisbrown closed 1 year ago

Chriisbrown commented 1 year ago

This PR supersedes #201 with a corrected name for the branch

tomalin commented 1 year ago

I remain mystified as to why git CI is not providing a link to the gitlab job that ran the CI. This works OK for all our other PRs. However, I assume it is this one https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/5207906 , where the cmsRun jobs crash because L1Trigger/TrackTrigger/data/clf_GBDT_emulation_newKF_digitized.json is not found. I'd suggest adding this file to TrackTrigger/data/ in cms-L1TK , and in addition, making a PR of this .json file to https://github.com/cms-data/L1Trigger-TrackTrigger . (Incidentally, do we still need the GBDT_default.onnx file there?)

Chriisbrown commented 1 year ago

I remain mystified as to why git CI is not providing a link to the gitlab job that ran the CI. This works OK for all our other PRs. However, I assume it is this one https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/pipelines/5207906 , where the cmsRun jobs crash because L1Trigger/TrackTrigger/data/clf_GBDT_emulation_newKF_digitized.json is not found. I'd suggest adding this file to TrackTrigger/data/ in cms-L1TK , and in addition, making a PR of this .json file to https://github.com/cms-data/L1Trigger-TrackTrigger . (Incidentally, do we still need the GBDT_default.onnx file there?)

I don't know if we want the code to point to the floating point or digitized version of the BDT as currently the default for track finding is the floating point KF?

cgsavard commented 1 year ago

I worry about having different versions of the bdt in different places. This means that whenever it is updated, it needs to be changed in multiple places. And it isn't clear to users what is the current version. I would suggest either having 2 versions, one for simulated and one for emulation that's digitized, or we use the digitized json version also for the simulation and then we only have one model at all times. Using the digitized json version will produce very similar results for the floating point version and so I think this switch is fine, but it would be good to double check if not done so already. This will allow us to get rid of the .onnx version. But I don't think including it in the L1Tk repo and cms-data is a good idea because then it means it has to be updated in both every time. @tomalin thoughts?

tomalin commented 1 year ago

I worry about having different versions of the bdt in different places. This means that whenever it is updated, it needs to be changed in multiple places. And it isn't clear to users what is the current version. I would suggest either having 2 versions, one for simulated and one for emulation that's digitized, or we use the digitized json version also for the simulation and then we only have one model at all times. Using the digitized json version will produce very similar results for the floating point version and so I think this switch is fine, but it would be good to double check if not done so already. This will allow us to get rid of the .onnx version. But I don't think including it in the L1Tk repo and cms-data is a good idea because then it means it has to be updated in both every time. @tomalin thoughts?

You & Chris are the experts on the BDT, so I rely on your judgement. In general, I agree that having two versions of the code is a bad idea. What is the justification for keeping the simulated version? Is it easier to play with if one wants to try out new BDT ideas? Or is the simulated version the only one that works with the old KF?

Chriisbrown commented 1 year ago

I worry about having different versions of the bdt in different places. This means that whenever it is updated, it needs to be changed in multiple places. And it isn't clear to users what is the current version. I would suggest either having 2 versions, one for simulated and one for emulation that's digitized, or we use the digitized json version also for the simulation and then we only have one model at all times. Using the digitized json version will produce very similar results for the floating point version and so I think this switch is fine, but it would be good to double check if not done so already. This will allow us to get rid of the .onnx version. But I don't think including it in the L1Tk repo and cms-data is a good idea because then it means it has to be updated in both every time. @tomalin thoughts?

You & Chris are the experts on the BDT, so I rely on your judgement. In general, I agree that having two versions of the code is a bad idea. What is the justification for keeping the simulated version? Is it easier to play with if one wants to try out new BDT ideas? Or is the simulated version the only one that works with the old KF?

Our conclusion is to keep only one version of the BDT and just keep the emulated BDT as this can be used by the simulation, emulation and hardware. I can make a PR to the cms-data repo with this, though do we also want to delete the previous one there or wait until this PR has been pushed to central cmssw as we would break the current cmssw by deleting that old BDT

skinnari commented 1 year ago

@Chriisbrown i wasn't following this closely. are we good to merge this?

Chriisbrown commented 1 year ago

@Chriisbrown i wasn't following this closely. are we good to merge this?

@skinnari I think we were happy to merge this, but when these changes get merged into central cmssw we will also need to update cms-data with the new BDT

tomalin commented 1 year ago

Our conclusion is to keep only one version of the BDT and just keep the emulated BDT

Hi @Chriisbrown , in your earlier comment, you & Claire said "Our conclusion is to keep only one version of the BDT and just keep the emulated BDT". Doesn't this mean that this PR needs to delete the BDT C++ simulation, and just keep the BDT C++ emulation? At the moment, both are still present.

Chriisbrown commented 1 year ago

Our conclusion is to keep only one version of the BDT and just keep the emulated BDT

Hi @Chriisbrown , in your earlier comment, you & Claire said "Our conclusion is to keep only one version of the BDT and just keep the emulated BDT". Doesn't this mean that this PR needs to delete the BDT C++ simulation, and just keep the BDT C++ emulation? At the moment, both are still present.

@tomalin L1Trigger/TrackTrigger/data doesn't currently exist in the repo as in the current setup this is pulled in from cms-data that contains the C++ simulated BDT. This PR adds the L1Trigger/TrackTrigger/data to our dev repo so that we can use the emulated BDT but I can't delete the simulation cms-data BDT, as anyone using our last merge with central cmssw will need this. So when we merge our dev branch into central cmssw in the future we will also need to make a PR to cms-data with the new emulated BDT and at that point delete the old simulated BDT and also delete the L1Trigger/TrackTrigger/data folder which is what we had to do when we added the original simulation BDT. Let me know if this doesn't make sense or you think we should do this another way, as far as I can tell there isn't a methodology for updating ML models in CMSSW in a neat way

tomalin commented 1 year ago

data

@Chriisbrown, I understand your point about the data/ directory. But I was asking about the L1-track-specific C++. Do we have C++ code that is only needed for the BDT simulation in our repo, which can be deleted now we have a BDT emulation? Or is there really no C++ code, and the only difference between the simulation and emulation is in the data/ files?

Chriisbrown commented 1 year ago

data

@Chriisbrown, I understand your point about the data/ directory. But I was asking about the L1-track-specific C++. Do we have C++ code that is only needed for the BDT simulation in our repo, which can be deleted now we have a BDT emulation? Or is there really no C++ code, and the only difference between the simulation and emulation is in the data/ files?

@tomalin there is the emulated kfout that runs the emulated BDT with bit accurate inputs as expected in firmware, this is not flexible and will run the BDT that is expected in firmware and matches that firmware exactly, this only outputs into the pattern files when running the KFout emulation. There is also the track quality package that is more generic and will run any BDT or NN and is kept generic to allow people to develop different classifiers, this will output into the ntuples when running the l1 track ntuple maker. This code can also run the emulated BDT but it will not have bit accurate inputs as it runs within the L1FPGATrackproducer. We wanted there to only be a single reference model in all our repos, the one that corresponds to the emulated BDT, which will match firmware but can also be run in the track quality package. Ultimately, it would be better to have a single track quality package that can be run within the KFout emulation but due to the way the KF and KFout emulators have been set up this is not currently possible

tomalin commented 1 year ago

The comment at the top of ProducerKFout.cc should make clear that it runs the Track Quality BDT.

tomalin commented 1 year ago

@Chriisbrown & @cgsavard as Chris's PR has always crashed git CI for some unknown reason, I just rebased his branch cbrown_to_emu to the latest version of cms-L1TK:L1TK-dev-12_6_0_pre5 and push the result to a branch of my own cms-L1TK:cbrown_tq_emu_rebased . I then made a Draft PR from it https://github.com/cms-L1TK/cmssw/pull/221 . As you can see there, this new PR does run git CI without errors. I therefore propose that we close PR 209, and complete the development in PR 221 (including the two comments above that I added today). Do you agree?

Chriisbrown commented 1 year ago

@Chriisbrown & @cgsavard as Chris's PR has always crashed git CI for some unknown reason, I just rebased his branch cbrown_to_emu to the latest version of cms-L1TK:L1TK-dev-12_6_0_pre5 and push the result to a branch of my own cms-L1TK:cbrown_tq_emu_rebased . I then made a Draft PR from it #221 . As you can see there, this new PR does run git CI without errors. I therefore propose that we close PR 209, and complete the development in PR 221 (including the two comments above that I added today). Do you agree?

This is fine by me, I don't understand how I managed to break the CI. I will redo my changes in response to today's comments in your PR 221

tomalin commented 1 year ago

I'm closing this PR, after copying it to https://github.com/cms-L1TK/cmssw/pull/221 (where it is rebased to latest code and CI runs OK).