Open tomeichlersmith opened 3 years ago
+1 on this and in particular, MIP tracking should be pulled out and not run as part of every event (it's a relatively time consuming algorithm and is intended to be used on a subset consisting of tricky events). If it's a separate processor people are free to use it on any set of events they want.
@tomeichlersmith @bryngemark @vdutta The new version of BDT will include the number of straight tracks from MIP tracking in the feature list. It might be tricky to separate the MIP tracking out. The only way we could imagine is to train two versions of BDT, one with number of straight tracks and one without. But then the usual way of referring to the number of events after the BDT cut will be ambiguous. The time to evaluate two BDTs might be the same scale as running MIP tracking in the first place.
I'm not saying folks who want to run the BDT should not be required to run MIP tracking, what I'm saying is there are a lot of folks who just use the shower features (like me for instance).
Perhaps the new BDT would require MIP tracking. In this way, the BDT processor would require the MIP Tracking processor to have been run before it. Perhaps another BDT does not require the MIP tracking and as such that other BDT processor would not require the MIP Tracking processor to run before it. Factorizing the code in this way allows for people to opt in for certain requirements if they want. Does that make sense?
as a side note, how is it that the BDT requires MIP tracking nowadays? when it was developed, MIP tracking rejected the last 10 events that the BDT didn't already reject. is it entirely unthinkable to have a separate MIP-tracking based veto step (perhaps in the context of a BDT if that's needed) that is only applied after a first BDT fails to reject an event?
The number of straight and the number of linear-regression MIP tracks are included as feature inputs to the newer BDTs. I think I agree with you that a "fast" BDT would be nice to have since the MIP tracking is time consuming especially on events with a lot of hits that are going to be rejected anyways for other easier reasons.
Another reason (from my point of view) to factorize so that different BDTs and selections can be more explicity about their requirements.
The BDT will use the number of straight tracks from MIP tracking (not linear regression tracks).
We are considering breaking up EcalVeto into three processors:
I understand it could be helpful to have a simple BDT without MIP tracks. We can try to compare the performance with and without the straight tracks + additional MIP track selections.
how is it that the BDT requires MIP tracking nowadays?
that's the "mip" in "segmip" :P
I think our plan is to have:
(sorry Danyi, I think we typed at the same time! [but at least we are saying the same thing :D ])
thanks all for your patience, I think I was a little sloppy -- I understand and am aware that it is included, I was mostly wondering why it was considered necessary.
but anyways! it sounds like we have a good path ahead, I think the suggested split sounds really great and will cover the different needs really well.
I ran a version where MIP tracking is set to -1 for the BDT input in both signal and bkg: https://github.com/LDMX-Software/ldmx-sw/actions/runs/11320865028 The results are not terrible, but clearly I need a bigger stat study.
Signal: EcalVetoResults_EcalVetoResults_bdt_disc.pdf
Edit: I've given this more thought as I chatted with @tvami on slack.
Currently, the
EcalVetoProcessor
is very hefty. Moreover, a lot of the variables calculated by the veto processor can be used by other analyses. With this in mind, my proposal is to break up the current veto processor into different processors that create different event bus objects.Hopefully, breaking up the ecal veto in this way will make it more maintain-able. This will also be cause for some additions to the event model, but I can hold off from removing the
EcalVetoResult
object until we are comfortable with breaking on-disk backwards compatibility.