How to properly do uniqueness tracking

JeffersonLab / gluex_root_analysis

9 stars 9 forks source link

How to properly do uniqueness tracking #111

Open s6pepaul opened 5 years ago

s6pepaul commented 5 years ago

Hello,

while I am getting towards the final stages of parts of my analysis I am wondering how to properly take care of multiple combinations in one event. By that I do not mean multiple combos from random tagger hits. This was well explained by Richard in a tutorial given at a Physics WG meeting. I mean the additional combos arising from allowing additional tracks or showers in the event. I think I brought this topic up a couple of times with various people and so far the consensus was always: This is not an easy topic and we should discuss this more.

I wrote a short document stating what the current recommendations are for uniqueness tracking and where I see problems with that. You find it here: https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=3993

Short summary of the problems (not complete but meant as a starting point to think about uniqueness tracking):

histogram the phi angle of a proton to measure the (e.g. pi0) beam asymmetry: what happens if you have two possible protons that survive all cuts. Which one should we take? Going by the current recommendations we should take both, but this inflates the signal (and messes up errors)
histogram variables from kinematic fit four-vectors: if there are additional tracks/showers all particles will have slightly different values based on which subset (i.e. combo) of tracks/showers was used for the fit
histogram inv. mass in bins of momentum transfer (measure dsigma/dt): Stay with gp->pi0 p plus an additional track that could be the proton as example. The inv. mass only depends on the photons from the pi0 decay but the momentum transfer can be calculated from the proton. So depending on which track is used as proton the event could even end up in different bins.

A possible solution (not tested, just an idea): Don't do uniqueness tracking anymore. Instead always take all combinations. If after applying all cuts (i.e. using all the discriminatory information you have on a given event) n combos survive an event that do not originate from beam photons than weight the combo with 1/n

I admit this might not be a huge effect for some analyses but given that we really push the statistical uncertainties to less than 1% in some channels it might be something that biases the results and/or their errors. In any case it is surely something we want to get right.

I hope that this triggers a discussion that yields some sort of definite prescription of how to properly account for multiple combinations.

zihlmann commented 5 years ago

To keep the discussion going: Answer to "Question" in section 3 page2: No: there is no new trackID. The track ID is unique to the measured track. But the combo ID is different for the two different combos even if the proton track input is the same. here an example of two different combos with the same proton track id (2): the measured quantities are obviously the same but the kinfitter quantities not. 2 / 2 Pmeas: 1.23507 / 1.23507 Pkinfit: 1.23923 / 1.18369

... to be continued.

On 5/3/19 5:20 AM, Peter Pauli wrote:

Hello,

while I am getting towards the final stages of parts of my analysis I am wondering how to properly take care of multiple combinations in one event. By that I /do not/ mean multiple combos from random tagger hits. This was well explained by Richard in a tutorial given at a Physics WG meeting. I mean the additional combos arising from allowing additional tracks or showers in the event. I think I brought this topic up a couple of times with various people and so far the consensus was always: This is not an easy topic and we should discuss this more.

I wrote a short document stating what the current recommendations are for uniqueness tracking and where I see problems with that. You find it here: https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=3993

Short summary of the problems (not complete but meant as a starting point to think about uniqueness tracking):

histogram the phi angle of a proton to measure the (e.g. pi0) beam asymmetry: what happens if you have two possible protons that survive all cuts. Which one should we take? Going by the current recommendations we should take both, but this inflates the signal (and messes up errors)

histogram variables from kinematic fit four-vectors: if there are additional tracks/showers all particles will have slightly different values based on which subset (i.e. combo) of tracks/showers was used for the fit

histogram inv. mass in bins of momentum transfer (measure dsigma/dt): Stay with gp->pi0 p plus an additional track that could be the proton as example. The inv. mass only depends on the photons from the pi0 decay but the momentum transfer can be calculated from the proton. So depending on which track is used as proton the event could even end up in different bins.

A possible solution (not tested, just an idea): Don't do uniqueness tracking anymore. Instead always take all combinations. If after applying all cuts (i.e. using all the discriminatory information you have on a given event) /n/ combos survive an event that /do not/ originate from beam photons than weight the combo with /1/n/

I admit this might not be a huge effect for some analyses but given that we really push the statistical uncertainties to less than 1% in some channels it might be something that biases the results and/or their errors. In any case it is surely something we want to get right.

I hope that this triggers a discussion that yields some sort of definite prescription of how to properly account for multiple combinations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://gcc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffersonLab%2Fgluex_root_analysis%2Fissues%2F111&data=02%7C01%7Czihlmann%40jlab.org%7C0048789d82094e447dc908d6cfa88b55%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C0%7C636924720130270735&sdata=XPw%2FN%2FtBU3jrT12GADz0liX1ZsTjnFwi1T3hPTqwIHc%3D&reserved=0, or mute the thread https://gcc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADF7ACYXA3N7NGSRKKK2VKDPTP7UVANCNFSM4HKSMSAA&data=02%7C01%7Czihlmann%40jlab.org%7C0048789d82094e447dc908d6cfa88b55%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C0%7C636924720130270735&sdata=wMtrD94b3x40UlwEsDNiouMmTNbOLdMZyI5ZA6eYcvI%3D&reserved=0.

s6pepaul commented 5 years ago

The track ID is unique to the measured track. But the combo ID is different for the two different combos even if the proton track input is the same.

Ok, that means that if you use a set to keep track of used unique track IDs to histogram the kin. fitted values you would only get the 'first' combo, whichever this may be. So this is not what you want to do to plot kin. fitted values. So you use both? Or just one based on a figure-of-merit (missing mass sq. e.g.?)?

zihlmann commented 5 years ago

I am not sure I follow: I look at it like this: All combos are "unique" because at least one particle is different in each combo. If this particle is a beam photon out of time you give it a certain negative weight and you do not worry anymore (it seems). However, if the beam photon is in time and now you have a second photon in time which of course will lead to a different combo you do worry (it seems). and these combos will lead all to a weight of 1. Why should that be different from a combo where instead a second beam photon make a good combo in time but an other charged or neutral track makes a good combo. (by good combo I mean in time and all kin fit quantities are making you happy.) Note that in this case the combination with other out of time beam photons also can make it and contribute with negative weight. Also it obviously depends on what quantities you use! If you use kinfitter quantities then these will always be unique to the combo, however if you use measured quantities then this will not necessarily be the case. I still like your proposal in section 4 on page3. But wouldn't it mean that there are actually two weights? 1/N1 where N1 is the number of good combos(in time) and -1/N2 with N2 begin the number of out of time beam bunches to be considered so that there is a combined weight -1/N1*1/N2 for out of time combos and 1/N1 for in time combos. As a consequence the resulting histogram would be closer to be proportional to the number of events and not the number of combos and the comparison with MC data may be a better match because the combinatorial background in the DATA is most likely different than in the MC an in this way the eventnumber not the combinatorial number is the proportionality factor.

.... to be continued....

On 5/7/19 8:39 AM, Peter Pauli wrote:

The track ID is unique to the measured track. But the combo ID is
different for the two different combos even if the proton track
input is the same.
Ok, that means that if you use a set to keep track of used unique track IDs to histogram the kin. fitted values you would only get the 'first' combo, whichever this may be. So this is not what you want to do to plot kin. fitted values. So you use both? Or just one based on a figure-of-merit (missing mass sq. e.g.?)?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gcc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJeffersonLab%2Fgluex_root_analysis%2Fissues%2F111%23issuecomment-490061931&data=02%7C01%7Czihlmann%40jlab.org%7Cfdb6537b77484a4ba7ee08d6d2e9037f%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C0%7C636928295560446143&sdata=N5xNYYfEsnpxwcUY3EyF5aTe5QSqFTjlGEx1jtbwB20%3D&reserved=0, or mute the thread https://gcc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADF7AC475LEIZWCSGK6BU73PUFZ7FANCNFSM4HKSMSAA&data=02%7C01%7Czihlmann%40jlab.org%7Cfdb6537b77484a4ba7ee08d6d2e9037f%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C0%7C636928295560456148&sdata=RQCqN6gkGGbHYlQgN08dKoZr5fkQr2gKM0XfOSYtN3o%3D&reserved=0.

gvasil commented 5 years ago

Hello Peter,

I don't know how exactly you would implement this procedure in your analysis code but I gave it a try following your suggestion and Beni's comments. I am attaching the results from the eta pi+ pi- and the eta pi0 pi0 analyses. The blue histograms were filled using Uniqueness Tracking and the red ones were filled without Uniqueness Tracking but assigning weights to each combo instead.

MEtaPiPlusPiMinusWithAndWithoutUniqueness MEtaPi0Pi0WithAndWithoutUniqueness

zihlmann commented 5 years ago

Can you put your DSelector code somehwere where one could look at it. To me it is not clear what you mean by "using Uniqueness tracking" and what means "using Weights" not only do you lose events in the red histogram but but it is not the same everywhere changing the shapes of some of the peaks!

even more to the point. the eta pi0 pi0 final state I assume is constructed from 6 gammas in the final state. Taking only the first combination and fill it with weight 1 I think is wrong. All working combinations should be used and filled with Weight=1/N where N is the number of surviving candidates. the combination with out of time photons that will survive in such an event will carry a weight of W=1/N*1/N_beam_bunches This is of course only a proposal others may disagree. But this will make sure that the integral over one trigger event gives you only one good event.

s6pepaul commented 5 years ago

Hi George, very interesting to see. Thank you for trying it! I plan to do a similar test on MC in the next week. There it is also possible to compare to the actual number of events. I have a few questions regarding your test:

What exactly did you include in your uniqueness tracking when you made those plots?
Are those measured or kin. fitted quantities?
How exactly did you implement the weights?

I was thinking about doing something along the lines of: In the combo loop

vector<Int_t> locCombosPassedCuts;
set<map<Particle_t, set<Int_t> > > AdditionalCombos;  //similar to uniqueness tracking to make sure additional combo doesn't come from beam photons, we want to keep those and subtract with side peaks
Int_t MultComboWeight = 0; //start at 0 because we will add at least 1 to it -> weight 1
//Loop over combos
for(UInt_t loc_i = 0; loc_i < Get_NumCombos(); ++loc_i)
{
    if( "all your cuts" ){
        dComboWrapper->Set_IsComboCut(true);
    }
    else{
        locCombosPassedCuts.push_back(loc_i);

        map<Particle_t, set<Int_t> > AdditionalComboMap; // only for non beam particles
        AdditionalComboMap[KPlus].insert(locKPlusTrackID);
        AdditionalComboMap[KMinus].insert(locKMinusTrackID);
        AdditionalComboMap[Proton].insert(locProtonTrackID);
        if(AdditionalCombos(AdditionalComboMap) == AdditionalCombos()){
            MultComboWeight++;
            AdditionalCombos(AdditionalComboMap);
        }
    }
}

This gives you a vector with the combo IDs that passed all your cuts. Then in a second loop go over all the elements in the vector and assign weights 1/MultComboWeight*timingWeight, where timingWeight is 1 if it is the main peak and -1/N_sidepeaks if not. This should be the same as Beni's suggestion I think. I haven't tried this implementation yet. I will do it next week.

gvasil commented 5 years ago

Hi,

Below you can find the part of the eta pi+ pi- code that I used to get these histograms, omitting every line that is irrelevant to this study. The code is similar for the eta pi0 pi0 final state.

As for your questions, Peter:

In the Uniqueness tracking I included all the final state particles as well as the beam photon and the proton
The quantities that I plotted are the Kin. Fitted ones.
As for the implementation of the weights for each combo, I followed the suggestions in this discussion. You can find this implementation in the last 15 lines of my code below

Kind regards, George

    // the number of combos that survive after applying the cuts
    UInt_t locNumCombosAfterAllTheCuts = 0;

    //Loop over combos
    for(UInt_t loc_i = 0; loc_i < Get_NumCombos(); ++loc_i)
    {
            // store the information about the combos that were cut
            if(locAllCutsExcludingOneCut[8]) \\ apply all the cuts but exclude the RF beam bunch cut
            {
                    dComboWrapper->Set_IsComboCut(true);
            }

            // if the current combo survives all the cuts, increment the relevant variable
            if(!locAllCuts)
            {
                    locNumCombosAfterAllTheCuts += 1;
            }

            // Weight for drawing accidental subtracted histograms  
            double locAcciWeight = 0.0;

            if(fabs(locDeltaTBeamMinusRF) < (0.5 * 4.008)) { // prompt signal recieves a weight of 1
                    locAcciWeight = 1.0;
            }
            else { // accidentals receive a weight of 1/# RF bunches included in TTree (8 in this case)
                    locAcciWeight = -1.0 / 8.0;
            }

            // Fill the MEtaPiPlusPiMinus mass histogram that uses Uniqueness tracking
            if(!locAllCutsExcludingOneCut[8]) // include the beam photon in the uniqueness tracking
            {
                    map<Particle_t, set<Int_t> > locUsedThisCombo_EtaPiPlusPiMinusAllCuts3;
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[Unknown].insert(locBeamID); //beam
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[PiPlus].insert(locPiPlusTrackID);
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[PiMinus].insert(locPiMinusTrackID);
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[Proton].insert(locProtonTrackID);
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[Gamma].insert(locPhoton1NeutralID);
                    locUsedThisCombo_EtaPiPlusPiMinusAllCuts3[Gamma].insert(locPhoton2NeutralID);

        if(locUsedSoFar_EtaPiPlusPiMinusAllCuts3.find(locUsedThisCombo_EtaPiPlusPiMinusAllCuts3) == locUsedSoFar_EtaPiPlusPiMinusAllCuts3.end())
                    {
                            dHist_MEtaPiPlusPiMinusPostCutsHighEnergies->Fill(locEtaPiPlusPiMinusMassKinFit, locAcciWeight);

       locUsedSoFar_EtaPiPlusPiMinusAllCuts3.insert(locUsedThisCombo_EtaPiPlusPiMinusAllCuts3);
                    }

    } // end of 1st combo loop

    // loop again over the combos that survived, in order to plot the weighted histogram
    if(locNumCombosAfterAllTheCuts != 0)
    {
            for(UInt_t loc_i = 0; loc_i < Get_NumCombos(); ++loc_i)
            {
                    //Set branch array indices for combo and all combo particles
                    dComboWrapper->Set_ComboIndex(loc_i);

                    // Is used to indicate when combos have been cut
                    if(dComboWrapper->Get_IsComboCut()) // Is false when tree originally created
                            continue; // Combo has been cut previously

                    // define the weights for the accidentals and for the combos that survived the cuts

                    // Weight for drawing accidental subtracted histograms  
                    double locAcciWeight = 0.0; //
                    if(fabs(locDeltaTBeamMinusRF) < (0.5 * 4.008)) { // prompt signal recieves a weight of 1
                            locAcciWeight = 1.0;
                    }
                    else { // accidentals receive a weight of 1/# RF bunches included in TTree (8 in this case)
                            locAcciWeight = -1.0 / 8.0;
                    }

                    double locAcciAndComboWeight = 1.0; // weight to accidentally subtracted and combo-weighted histograms
                    if(locNumCombosAfterAllTheCuts != 0)
                    {
                            locAcciAndComboWeight = locAcciWeight / locNumCombosAfterAllTheCuts;
                    }

                    // fill the mass histo using weights for each combo (i.e. without Uniqueness tracking)
                    dHist_MEtaPiPlusPiMinusWithoutUniqueness->Fill(locEtaPiPlusPiMinusMassKinFit, locAcciAndComboWeight);

            } // end of second combo loop
    }

s6pepaul commented 5 years ago

Hi George, sorry it took me so long to get back to you. So if I understand correctly, then

        // if the current combo survives all the cuts, increment the relevant variable
        if(!locAllCuts)
        {
                locNumCombosAfterAllTheCuts += 1;
        }

Also includes cases where the additional combination arises from having two beam photons that could have triggered the reaction. I don't think you want to include these. They should be taken care of with your locAcciWeight. You really just want to include additional combinations from final state permutations.

I am currently testing the idea on MC and hope to have something ready to show soon.

gvasil commented 5 years ago

Hi Peter,

Apologies for my cryptic cut flow. I didn't include the relevant lines where the cuts are defined. The if-statement:

if(!locAllCuts)

applies also a cut on the RF beam bunches, allowing only the central RF peak. Therefore, I do not include beam photons from the other bunches. If, however, we have events with two photons that belong in the central RF peak for the same reaction then, indeed, these have been included in the permutations (even though they shouldn't)

When I tried running the same code over some MC data for both eta pi0 pi0 and eta pi+ pi- final states, the mass histograms "using Uniqueness tracking" and "using Weights" were almost identical. I am curious to see what will happen in your case.

s6pepaul commented 5 years ago

I finally got round to test my idea on a bit of MC data. First let me explain what the aim was and what I did: I wanted to test the idea of using weights to handle multiple combinations arising from final states NOT from beam photons. Therefore I created a sample of Lambda(1520) decaying into the pKK final state. I used MCwrapper and set "BKG=None" to not have any additional beam photons in the sample. The deltaTRF histogram just shows a peak at 0 without any side peaks. I didn't apply any cuts in addition to what ReactionFilter does to have a sample with relatively high background of combinatorics. The inv. mass constructed from the thrown information in the tree looks like this: pKm_thrown Note the integral of 32739. The inv. mass constructed from the measured information using uniqueness tracking: pKm_measured Note the integral of 35098. Also one can clearly see additional entries above 2GeV from combinatorics. The measured beam energy using uniqueness tracking: BeamEnergy Note the integral of 32739. Since there was always only one beam photon per event the number agrees with the thrown integral. The inv. mass constructed from the measured information using weights: pKm_measured_weighted Note the integral of 32739 which agrees with the number of thrown. Of course there are still entries above 2GeV, but the weights got the overall number of events right again. So assuming that one would measure a (very bad) cross section by counting all the events in the pKm mass spectrum then the weights would give the right answer and uniqueness tracking would be 7% out.

I looked at inv. mass, missing mass squared and beam energy histograms with uniqueness tracking and the weights method (both kinfit and measured but they agreed since I didn't place any additional cuts). Here is a summary of the integrals I got in each histogram: ............................ uniq. ..... weight ...... thrown beam energy......32739...... 32739 ...... 32739 inv. mass............35098...... 32739 ...... 32739 MM squared.......35259...... 32739 ...... 32739

In a real analysis one would place cuts to reduce the number of wrong combinations and the effect in this final state would be much smaller than 7%. I just wanted to keep many wrong combinations for this test to have a visual confirmation (entries >2GeV). It is really hard to guess how big the effect is in general. I think some final states will be more sensitive than others. I can imagine that especially final states with many particles of the same type (multiple photons e.g.) can have multiple combinations surviving all cuts.

I think that this test has shown that uniqueness tracking in its current form is not a good way of doing things. Using the weights might be a much better solution. How big the effect is for any analysis has to be assessed individually of course but in any case weights should give the "more correct" solution.

Let me know what you think of this. Cheers, Peter

PS: The root file containing all the histograms and the DSelector code I use to make them is here. Please have a look. As a comment: I produced the tree with the B4 option. Hence the part in the code for the accidental subtraction, so I can try it on real data as well. In this example it doesn't make a difference since there are no side peaks from accidentals.

PPS: It is a bit harder to repeat this test with background beam photons since the comparison to thrown data isn't that easy anymore. The IsCorrectCombo flag seems to be a bit broken. When I looked at the histograms produced with it (also in the root file) I see many events in the L1520 peak region that are marked as wrong combination. If someone has an idea of how to to this test please let me know.

gvasil commented 5 years ago

Hi Peter,

Sorry for the detailed reply. Your results are very interesting and made me wonder where this could be the reason for the differences that I see in the cross section that I get for the eta pi+ pi- and the eta pi0 pi0 channel. The eta pi0 pi0 channel has 6 photons in the final state, which give rise to a huge number of combinations, contrary to the eta pi+ pi- channel.

I have modified my code to match your code as much as possible (i.e. exclude the beam photons when calculating the weight for each combo) . Below I attach two histograms with the dsigma/dt results for both channels. The first one is the one I get when using Uniqueness Tracking, the second one is the one I get when I follow the "weights" procedure:

RatioOfDSigmaDtForThe6To12GeVInterval

As you can see the results are encouraging given the fact that the convergence between the two different decay modes improves.

The only thing that bothers me is that we are still partially using uniqueness tracking when we create the map of particles that stores the particles of the combo that passed the cuts. I am referring to these lines in your code:

map<Particle_t, set > AdditionalComboMap; // only for non beam particles AdditionalComboMap[KPlus].insert(locKPlusTrackID); AdditionalComboMap[KMinus].insert(locKMinusTrackID); AdditionalComboMap[Proton].insert(locProtonTrackID); if(AdditionalCombos.find(AdditionalComboMap) == AdditionalCombos.end()){ MultComboWeight++; AdditionalCombos.insert(AdditionalComboMap); }

I plan to show these results and continue this discussion tomorrow, in the Cross Section Workshop.

Kind regards, George

s6pepaul commented 5 years ago

Thank you for trying it again. Very interesting results and I am sure this will spark a discussion tomorrow! Interesting to see, that the red points don't just "drop down" onto the black points but both seem to move (e.g. in bin 2). I slightly disagree that we would still use uniqueness tracking when building the combo map that you referred to. Of course it looks very similar and uses the same idea but with a very different intention. This is used not to make sure not to double count but to make sure not to include combos from beam photons. So I get your point but think it is not really what people call uniqueness tracking so far.

PS. I updated the note on docdb and included the results from the MC study (https://halldweb.jlab.org/doc-private/DocDB/ShowDocument?docid=3993).