Study binning effect on selection of Hough tracks

noemina commented 3 years ago

[x] Develop a method to study the effect of noise due to binning effect (also useful to understand the material and magnetic field effects)
- [x] Evaluate the bin of the phi (or q/pT) for the true particle and pick the bin center
- [x] Use the phi (or q/pT) bin center from the true particle as hit values and make the Hough space accumulation plot. This will allow to understand how the bins spread for one variable if we fix the other one.
[x] Plot the residual for phi and q/pT (comparing the value of the reconstructed found track (nhits>8, 9, 10) and the true particle)
[x] Compare with bin width used
[x] For q/p_T the bin size in the Hough space is not optimal. Let's try to change it and see how the counts vs q/pT residual changes.
[x] Same as above, shown using nhits>8, 9, 10 when selecting the reconstructed tracks
[x] Once we have defined a better q/p_T bin size and observed how the residual change as a function of the number of hits in the bin used to select the reco track we need to look at efficiency and duplicate rate as a function of the number of hits.
[x] --> Look at the other issue where combining the longitudinal and transverse HTs ;)

AndrewSpano commented 3 years ago

Plotted the true bin inside the heatmap for a single-particle file:

true_bin

I thought of doing it for many-particle files but the result would be the same, just that I would have to "zoom in" in different regions to see the highly populated bins. In the above heatmap we can see a mismatch in the q/p_T range (crowded region should be 1 bin below, so that it is the yellow). Let's fix the values of the axes to the truth values and see how the hits distribute around it:

fixed_phi

fixed_qpt

In order to gain more insight, it could be a good idea to plot the residuals (truth-parameter-values minus estimated-parameter-values) for every particle found. I did this for both the precide and approximated transformation:

residuals

Less tracks were found by the approximated transformation, but we can see that for the common ones, the approximations are quite close. The same holds for q/p_T values.

ToDo: Plot the average residuals per p_T and eta values.

AndrewSpano commented 3 years ago

The residuals for 1 event are:

1event_counts

and the bin size is:

1event_bs

We can see that for almost half of the q/p_T residuals are bigger than the bin size, which suggests that maybe we should increase it to also "catch" those particles. It also might be due to the following issue: We have many duplicate tracks:

sketch

Suppose that this is the Hough Space binned. Also suppose that I know how to draw. We pick the bins that have at least 3 hits (the ones with the blue spray). They all correspond to the same track in the original space. But which one of them should I consider the truth? During my "picking", I will pick just one of them, but this may result in picking a bin that is a-few-bins-away from the optimal bin. This might as well be the case here. This will be solved once we get rid of duplicates and start defining a cluster of bins in order to group similar ones.

I also tried to plot the residuals for the whole dataset:

all_phi_res

all_qpt_res

all_event_bs

noemina commented 3 years ago

Great! Can you make the counts vs q/pT with smaller ranges? Since the bin size is 0.1, would make sense to see what is the % of tracks with residuals smaller than this. You can try then to change the binning in the HT place (e.g. doubling the number of bins): the best binning should be one allowing the residual to be smaller than the bin size.

noemina commented 3 years ago

During my "picking", I will pick just one of them, but this may result in picking a bin that is a-few-bins-away from the optimal bin. This might as well be the case here. This will be solved once we get rid of duplicates and start defining a cluster of bins in order to group similar ones.

This is correct indeed! That's why clustering algorithms would be helpful.

AndrewSpano commented 3 years ago

Fixed the plots so that the firsy bin always contains hits smaller than the bin size for that component. Now they look like this (for 1 event):
Fixed the q/p_T bin size for that one event. Increased it to 0.1 from 0.05. This is the result I got:

We notice that the "most crowded" bin is indeed the truth bin.
Plotted the different residual counts for the whole dataset for nhits = 8, 9, 10:
- nhits = 10
- nhits = 9
- nhits = 8
We notice that indeed most phi residuals are very close to the first bin, though now quite inside it. We also can notice the gradual increase of accuracy in the q/p_T parameter residual. Most times it's reconstructed perfectly, though still there are some errors. This will need clustering in order to be fixed.

Also we can see that the number of tracks reconstructed gets higher as nhits gets smaller, which makes total sense since the bins selected with nhits = 9 or 10 is a subset of the bins selected with nhits = 8.
The plots of the duplicates are almost finished, need to add a few touches. The duplicate rate is quite high, which suggests that the previous point (clustering need) is valid.
Have a few bugs. I will fix them and upload the code shortly.

AndrewSpano commented 3 years ago

Efficiency-Duplicate rate per nhits:
Combination (first one where the found hits are removed)

Currently working on a second version of this where I look at the same hits per bins in both transforms. This needs a bit of tuning in order to pick an allowed threshold.

AndrewSpano commented 3 years ago

Implemented also the other 2 combination algorithms:

combs

The efficiency drops a bit, but so does the duplicate rate. Look like the first transform is the most best.

asalzburger / sms2021-tra-tra

Study binning effect on selection of Hough tracks #16