Bin TQ MVA output - Githubissues

cgsavard commented 6 months ago

PR description:

This implements the binning scheme for the output of the TQ MVA1 variable. This binning scheme was presented at the offline software general meeting, here. In order to avoid applying a logistic sigmoid function (which puts the variable within a 0-1 range) to the BDT output in the firmware, the MVA variable is binned before the transformation is applied and the normal calling of trkMVA1() in the TTTrack class will apply the transformation. An additional call to a pre-transformed variable was also included for convenience and extra checks.

PR validation:

All checks were run and this was tested with both the HYBRID and NEW_KF configurations of the tracking code. Here are some outputs showing the conversion between the different MVA variables in order to check that everything is working as expected (bins are [-480, -2.197, -1.386, -0.405, 0.405, 1.386, 2.197, 2.944] for pre-logistic sigmoid, corresponding to [0, 0.1, 0.2, 0.4, 0.6, 0.8, 0.9, 0.95] after the transform):

HYBRID tracking example 1: TTTrack->trkMVA1Pre(): 2.31981
TTTrack->trkMVA1(): 0.910505
TTTrack_Word->getMVAQualityBits(): 6
TTTrack_Word->getMVAQualityPre(): 2.197 TTTrack_Word->getMVAQuality(): 0.9

HYBRID tracking example 2: TTTrack->trkMVA1Pre(): 2.01361
TTTrack->trkMVA1(): 0.882219
TTTrack_Word->getMVAQualityBits(): 5
TTTrack_Word->getMVAQualityPre(): 1.386 TTTrack_Word->getMVAQuality(): 0.8

NEWKF tracking example (default set to 0, needs fix in separate PR): TTTrack->trkMVA1Pre(): -480 TTTrack->trkMVA1(): 3.4566e-209 TTTrack_Word->getMVAQualityBits(): 0
TTTrack_Word->getMVAQualityPre(): -480
TTTrack_Word->getMVAQuality(): 0

tomalin commented 6 months ago

Hi @cgsavard, my only worry here is what happens if the L1trigger group has already produced a MC containing all the L1 trigger objects, such as L1tracks, and they try reading it with a newer version of CMSSW that contains this PR. I don't know if CMSSW would warn them of their mistake by throwing a fatal error because of the name change of the data member TTTrack::theTrkMVA1? Or would no error be thrown, and the old theTrkMVA1 value would be loaded into the theTrkMVA1pre data member, giving nonsense. So do we have to release this with a major CMSSW version change (e.g. CMSSW 15), accompanied by warnings to the trigger group?

tomalin commented 6 months ago

Minor comment. adding "pre" to variable name "theMVA1pre" is a little obscure. But I don't have a better suggestion.

tomalin commented 6 months ago

If you think it's likely that MVA2 & MVA3 will need a similar approach, then I suggest you make that change now, at least for the data members, so that the data members of this TTTrack EDProduct are not changing too often, causing bcackwards incompatbility.

cgsavard commented 6 months ago

@tomalin I'm not sure I follow entirely the TTTrack::theTrkMVA1 discussion. Are you talking about the switch here? I'm not sure this switch would affect anything since they are private variables so they are only recognized internally in the class. Unless for some reason the L1Trigger group can access this? I imagine their samples use the TTTrack::trkMVA1() call and this function represents the same value so no mistmatch will show up there. Perhaps I am mistunderstanding how the L1Trigger MC samples can use the private theTrkMVA1 variables?

I imagine we might want to use MVA2 for a displaced TQ BDT, which I've been getting a lot of inquiries about people wanting to try out. I imagine that that BDT would require the same set up, so I can make that change. As for MVA3, there hasn't really been any discussion on what this may be set with. If it's not a BDT, then it may require a slightly different set up. If in the above paragraph we decide the change from theTrkMVA1->theTrkMVAPre1 won't change anything then I am in favor of keeping both of these as is and I will change them when we decide how to fill MVA2 and MVA3. Otherwise, I can go ahead and make the change for MVA2 now but am unsure about MVA3. What do you think is best?

tomalin commented 6 months ago

@tomalin I'm not sure I follow entirely the TTTrack::theTrkMVA1 discussion. Are you talking about the switch here? I'm not sure this switch would affect anything since they are private variables so they are only recognized internally in the class. Unless for some reason the L1Trigger group can access this? I imagine their samples use the TTTrack::trkMVA1() call and this function represents the same value so no mistmatch will show up there. Perhaps I am mistunderstanding how the L1Trigger MC samples can use the private theTrkMVA1 variables?

I imagine we might want to use MVA2 for a displaced TQ BDT, which I've been getting a lot of inquiries about people wanting to try out. I imagine that that BDT would require the same set up, so I can make that change. As for MVA3, there hasn't really been any discussion on what this may be set with. If it's not a BDT, then it may require a slightly different set up. If in the above paragraph we decide the change from theTrkMVA1->theTrkMVAPre1 won't change anything then I am in favor of keeping both of these as is and I will change them when we decide how to fill MVA2 and MVA3. Otherwise, I can go ahead and make the change for MVA2 now but am unsure about MVA3. What do you think is best?

Hi Claire, if an existing MC dataset contains TTTrack objects made prior to this PR, then the data members of this class, which is what is stored in the .root file, include theTrkMVA1_ . If this file is read into a CMSSW version that includes the code changes in this PR, then and someone's C++ requests (via Handle) to load TTTrack from disk into memory, then ROOT will be confused the variable theTrkMVA1 no longer exists in the TTTrack objects that it' being asked to fill in memory. It might guess that it can load the value from theTrkMVA1 into theTrkMVA1pre, but there's a good chance it will mess up. This is nothing to do with private or public functions or data members.

cgsavard commented 6 months ago

@tomalin Do you have a solution in mind that I can implement?

tomalin commented 6 months ago

@tomalin Do you have a solution in mind that I can implement?

Hi Claire, one thing is to minimise how often we change the data members of EDProducts. Hence the suggestion to change MVA2 & MVA3 at the same time. The other is time time the PR for a major CMSSW release, but CMSSW 15 not foreseen until late 2024 https://twiki.cern.ch/twiki/bin/viewauth/CMS/ReleaseSchedule. I suggest you ask CMS SW group for advice.

Chriisbrown commented 6 months ago

Hi having had a close look at this I realise that changes have already been made to the track word to include the track quality binning and associated functions in a similar manner to the chi2 variables. This has been the case in central cmssw since CMSSW14_1_0_pre2 (released march 27) (https://github.com/cms-sw/cmssw/blob/master/DataFormats/L1TrackTrigger/interface/TTTrack_TrackWord.h#L106). So once we rebase our dev branch here to that we'll bring in those changes, so Claire's changes to the track word will cause merge conflicts and then possibly break any downstream L1 trigger code.

cgsavard commented 6 months ago

@tomalin so the SW group said that the default behavior is that trk_MVA_ will be ignored and trk_MVA1Pre_ will be set to 0 (which means that the trk_MVA1() function will give 0.5). And I think this is fine for the transition as the only people that use the MVA variables that I know so far is the GTT group and I plan to make these same changes in their repo and will make them aware of the change. I suggest making the same change to the other two MVA variables, and then opening the same PR in l1t-offline.

For the point @Chriisbrown made, I suggest also opening a PR into central CMSSW making the proper changes there, that way everything will match. I see that this MVA variable is used in the NNVtx code so I'll also make sure that that variable called is the proper one.

How does this all sound?

tomalin commented 6 months ago

@tomalin so the SW group said that the default behavior is that trk_MVA_ will be ignored and trk_MVA1Pre_ will be set to 0 (which means that the trk_MVA1() function will give 0.5). And I think this is fine for the transition as the only people that use the MVA variables that I know so far is the GTT group and I plan to make these same changes in their repo and will make them aware of the change. I suggest making the same change to the other two MVA variables, and then opening the same PR in l1t-offline.

For the point @Chriisbrown made, I suggest also opening a PR into central CMSSW making the proper changes there, that way everything will match. I see that this MVA variable is used in the NNVtx code so I'll also make sure that that variable called is the proper one.

How does this all sound?

@cgsavard This sounds like a good plan. Though please tell Keith Ulmer and the L1 trigger group conveners about this (when your PR is ready), so that it doesn't catch anyone by surprise.

Chriisbrown commented 6 months ago

@cgsavard from the point of view of the NNvtx any changes to the track word including a rebinning of the MVA output would require us to do a full retrain and given the NNvtx has only just gone into cmssw and any update to the model then takes a separate PR to cms-data this would take time. Do we (cms-l1tk) need to do this change to the track word now or can we bring in the changes from central cmssw and use that implementation of the MVA in the track word? And then at some point in the future make this update to the track word after testing any update to the binning scheme with the downstream users such as the NNvtx. Either way Benjamin requests that it is discussed in a GTT meeting

tomalin commented 6 months ago

@cgsavard from the point of view of the NNvtx any changes to the track word including a rebinning of the MVA output would require us to do a full retrain and given the NNvtx has only just gone into cmssw and any update to the model then takes a separate PR to cms-data this would take time. Do we (cms-l1tk) need to do this change to the track word now or can we bring in the changes from central cmssw and use that implementation of the MVA in the track word? And then at some point in the future make this update to the track word after testing any update to the binning scheme with the downstream users such as the NNvtx. Either way Benjamin requests that it is discussed in a GTT meeting

Hi Chris, the code you put in central CMSSW used different MVA binning https://github.com/cms-sw/cmssw/blob/master/DataFormats/L1TrackTrigger/interface/TTTrack_TrackWord.h#L106 to Claire's new code https://github.com/cms-L1TK/cmssw/blob/tq_binning_reorganize/DataFormats/L1TrackTrigger/interface/TTTrack_TrackWord.h#L107 . If she makes her PR, but retains your binning (until some future agreed time when it's retuned), would that solve your problem? (I assume the change to pre-sigmoid binning is no problem?).

Chriisbrown commented 6 months ago

Hi Ian, This would be fine so long as the getMVA1() functions and the MVA bits in the track word give the same answer between the PRs. Though thinking about this when we tune the BDT output it'll be done in the 0-1 space, it isn't clear how we would create the "presigmoid" bins that the trackword now wants and ensure that we are doing is consistent every time

cgsavard commented 6 months ago

@Chriisbrown The getMVA1() function and bits will be exactly the same as what you are using if you just change the bins to what you are using. But we should definitely get together and finalize the bins as there are other GTT objects that also use the MVA and I partially tuned the bins with those objects in mind (track jets, track HT, track MET). Regardless of the binning scheme, this is the proper way we should be binning everything since we do not want to do a logistic sigmoid on the firmware. And again, it will not affect your NNvtx as long as you change the binning to what you want.

Chriisbrown commented 6 months ago

@cgsavard I don't know if just changing the binning is a viable solution when using the NNvtx as the upcoming MC campaign in CMSSW 14 will use the previous binning meaning the tracks created in that campaign will have that binning. If we were to change the binning now anyone rerunning the track finding will then have to adapt the binning if they want to see the same MVA track word which has the potential to confuse users of the NNvtx or the BDT in general, if we are going to change the binning we need to time it with a major CMSSW release or a major MC campaign with an accompanying talk in a L1 trigger meeting of some sort. I think we should have understandable bins in software (between 0 and 1) and derive the presigmoid bins for the FW, any changes to the binning will be done by a software developer where the 0 to 1 scheme makes sense not a firmware developer where the bins are just some constants in a file (https://github.com/cms-L1TK/l1tk-for-emp/blob/47e69eb20fd7573644a3c80c74dee96b49f8ab9e/kfout/firmware/hdl/kfout_config.vhd#L61C10-L61C22)

cgsavard commented 6 months ago

@Chriisbrown So are you suggesting we have two sets of bins in software, one with the 0-1 binning scheme and one with the pre-sigmoid? Right now we've got two different places that are being binned and each of these places needs a different binning scheme. We can certainly bin the track word value post-sigmoid, but you also set up the KFOut module to bin the tq variable pre-sigmoid here to match the firmware and that means we also need pre-sigmoid bins. I don't see any issue with the next person who studies and changes the bins to just apply an inverse sigmoid when setting them, and the MVA users will still get the 0-1 binning scheme when using it downstream because I set up the track word class to apply this transformation here.

Once again, nothing has changed from the user's side. trkMVA1() still gives the track float on a 0-1 range and getMVAQuality() still gives the digitized 0-1 values from the track word. The only reason the bins are set pre-sigmoid is because in the KFout module we are binning the pre-sigmoid value. If we want to bin the post-sigmoid value in the KFout then I think we can switch it all back and make that change, but I thought you set it up that way specifically to match the firmware. What we definitely shouldn't do is have two different places where bins are set like what is done now here and here because that's messy and bound to cause mistakes in the future.

We can definitely change the bins I have set here to what you guys want for the NNVtx for now, but we will have to discuss another update to this further as many of the other GTT objects are affected by this.

Chriisbrown commented 6 months ago

No I think we should have one sets of bins in the track word only and those bins should be between 0 and 1 so that when a downstream user tunes their threshold between 0 and 1 they can know if this makes sense in the binning. The KFout can calculate what bins it needs itself and either have a way of printing them for the firmware or the firmware can derive them, but this can be put in place in the large rewrite of the ProducerKFout and ProducerTT that will be done. The reason the KFout does it the way it does now is because there was no binning in the track word and I wanted to avoid any unnecessary updates to the dataformats, having the bins as a python config also allowed for less disruptive testing of different binning but now the BDT is in more widespread use it makes sense to have a binning in place in the track word and for the KFout to pull the binning from there.

cgsavard commented 6 months ago

So am I ok to add a logistic sigmoid to the KFout so that I can use the same bins? Or is there a specific reason that it is not already done and instead I should use an inverse sigmoid on the 0-1 bin value? I want to avoid using an inverse sigmoid since it's not defined at 0.

Chriisbrown commented 6 months ago

I would not add a sigmoid to the output of the MVA in KFout as this isn't what happens in FW, instead I would replace tqBins_ with an array that is derived from the (0-1) bins defined in the track word, so performing an inverse sigmoid on those bins. You can set the highest and lowest bins (the 0 and 1 that are undefined in the inverse sigmoid) as the maximum and minimum possible values the BDT can produce which is the maximum number that can be represented by an ap_fixed<10,5> and -1*that (this is +/- 16 for the ap_fixed<10,5> case but you may want to make this generic) . In the KFout there is an ap_fixed rescale that takes you from ap_fixed to integer hence the multiplication before the binning, this is to match the FW so whatever bins you create will need also to be multiplied by that ap_fixed_rescale before binning. As the KFout currently only produces tracks for the demonstrator tools this can be done now or when the KFout is updated to also produce the TTTrackwords

cgsavard commented 6 months ago

This will work now with how the KFout is set up, but eventually when the KFout sets the track word this becomes tricky. Do you imagine the track word being set with the tq bins or with the output tq variable? If it's with the latter then we will have to do an inverse transform on the MVA output and that can give us something undefined.

Perhaps there is another solution that makes us all happy. What if everything is done as I have set it up, but make the binning set up on the 0-1 range and there is an automatic conversion that happens to the pre-sigmoid binning within the track word class. This way the track and track word are still initialized pre-sigmoid, the functions to get the MVA variables are still on the 0-1 range, we avoid any sort of inverse transform in the future, our software matches what is done in firmware, and the user sees and sets a 0-1 binning which will make more sense to them.

Chriisbrown commented 6 months ago

I don't know if we want to restrict ourselves to a sigmoid activation function on the end of whatever track quality algorithm we train or design, the only reason there is this situation with the presigmoided bins is a optimisation conifer makes on the output of the BDT in firmware, we could include the sigmoid in the firmware but it would be a waste of resources. If we were to replace the BDT with an NN or BDT not trained on binary cross entropy we would have to rewrite the track word class.

A solution that I think removes this is putting a sigmoid in the KFout to only set the floating point trkMVA1. We run the emulated BDT in the KFout, this produces an “unsigmoided” value, this is then binned in the “unsigmoided” bins that we derive from the 0-1 bins in the trackword definition to give an integer between 0 and 7. That is put in the 3-bits of the TTTrackword. In fact we can set the entire TTTrack word here with the bits that the KFout produces, which is exactly what happens in firmware.

We can also sigmoid the output of the BDT to produce a value between 0 and 1 and this is then set as the trkMVA1 for the floating point tttrack. If we want we can check to see if the same bits are produced by the binning of the “presigmoid BDT” in its “presigmoid” bins and the binning of the “sigmoid BDT” and its 0-1 bins are the same, we know every possible value the BDT can produce (between -16 and 16 in 1024 steps) so can quickly scan across these.

What this does open up is the possibility that someone runs a setttrackwordbits after the TrackWord has been set by us in the KFout and the TrackWord changes as a result. Though anyone downstream changing bits in the Track Word that we set should know that this may give them incorrect results so I don’t think this is a problem. This way our firmware and emulation produce exactly the same tttrack words that downstream can use and we can tune the BDT binning based off studies in the 0-1 range.

cgsavard commented 6 months ago

So I changed the bins to the 0-1 range and changes the naming scheme back to reflect the post-sigmoid values. And the binning scheme is what is implemented in the CMSSW. @tomalin @Chriisbrown Does this look ok?

cgsavard commented 6 months ago

All comments have been resolved and this should be ready to merge.

cgsavard commented 6 months ago

All comments have been resolved.

Chriisbrown commented 5 months ago

I'm happy with everything so can go ahead and merge. We should also put in a PR here to remove the depreciated ONNX model

cms-L1TK / cmssw

Bin TQ MVA output #272

PR description:

PR validation: