asalzburger / sms2021-tra-tra

Repository for SummerStudent 2021 project to learn a (conformal) TRAnsform for TRAcks
1 stars 0 forks source link

Study binning effect on selection of Hough tracks #16

Open noemina opened 3 years ago

noemina commented 3 years ago
AndrewSpano commented 3 years ago

Plotted the true bin inside the heatmap for a single-particle file:

true_bin

I thought of doing it for many-particle files but the result would be the same, just that I would have to "zoom in" in different regions to see the highly populated bins. In the above heatmap we can see a mismatch in the q/p_T range (crowded region should be 1 bin below, so that it is the yellow). Let's fix the values of the axes to the truth values and see how the hits distribute around it:

fixed_phi

fixed_qpt

In order to gain more insight, it could be a good idea to plot the residuals (truth-parameter-values minus estimated-parameter-values) for every particle found. I did this for both the precide and approximated transformation:

residuals

Less tracks were found by the approximated transformation, but we can see that for the common ones, the approximations are quite close. The same holds for q/p_T values.

ToDo: Plot the average residuals per p_T and eta values.

AndrewSpano commented 3 years ago

The residuals for 1 event are:

1event_counts

and the bin size is:

1event_bs

We can see that for almost half of the q/p_T residuals are bigger than the bin size, which suggests that maybe we should increase it to also "catch" those particles. It also might be due to the following issue: We have many duplicate tracks:

sketch

Suppose that this is the Hough Space binned. Also suppose that I know how to draw. We pick the bins that have at least 3 hits (the ones with the blue spray). They all correspond to the same track in the original space. But which one of them should I consider the truth? During my "picking", I will pick just one of them, but this may result in picking a bin that is a-few-bins-away from the optimal bin. This might as well be the case here. This will be solved once we get rid of duplicates and start defining a cluster of bins in order to group similar ones.

I also tried to plot the residuals for the whole dataset:

all_phi_res

all_qpt_res

all_event_bs

noemina commented 3 years ago

Great! Can you make the counts vs q/pT with smaller ranges? Since the bin size is 0.1, would make sense to see what is the % of tracks with residuals smaller than this. You can try then to change the binning in the HT place (e.g. doubling the number of bins): the best binning should be one allowing the residual to be smaller than the bin size.

noemina commented 3 years ago

During my "picking", I will pick just one of them, but this may result in picking a bin that is a-few-bins-away from the optimal bin. This might as well be the case here. This will be solved once we get rid of duplicates and start defining a cluster of bins in order to group similar ones.

This is correct indeed! That's why clustering algorithms would be helpful.

AndrewSpano commented 3 years ago
AndrewSpano commented 3 years ago
AndrewSpano commented 3 years ago

Implemented also the other 2 combination algorithms:

combs

The efficiency drops a bit, but so does the duplicate rate. Look like the first transform is the most best.