Open AndrewSpano opened 3 years ago
After doing some testing, I found the following patterns:
Do you have some plots for the purity?
Purity vs count (how many tracks had purity falling in the ranges 0 - 0.1, 0.1 - 0.2, etc..)
Regarding the "deterministic" approach, while I was on the plane to Greece, I had a very stupid idea: For every x-y bin selected, run the r-z Hough Transform for -only- the hits inside that bin. This will help purify the hits. The idea was inspired by this plot:
The result was good:
The Purity vs count plot now looks like this:
The performance (for this one event) can be assess by the metrics:
Just purification
Purification + duplicate-removal-1 algorithm
Purification + duplicate-removal-2 algorithm
So I tried doing it for all the events in the with-material and non-homogenous-magnetic-field
dataset. The results I got were pretty surprising:
By analyzing later I saw for every event, at most 1 or 2 particles are not identified. This could due to the approximation error. Either ways, for more than half the events, the efficiency is 1.0, which should be good enough. I will postpone the Neural Network development as this approach is already yielding very good results.
[x] Implement baseline methods for removing duplicate tracks from the Hough Transform output.
For both baseline approaches implemented, the efficiency drops as well with the duplicate-fake rates. This happens because tracks that are not duplicates are considered so, thus they are removed. To solve this, we must fine tune a bit further those baseline algorithms:
[x] Implement a more profound (yet somewhat deterministic) method of filtering out duplicate tracks. Maybe build on top of the baseline and also use geometries?
[ ] Implement a Machine Learning approach to duplicate removal. For this: