ewengillies / track-finding-yandex

COMET Tracking : Machine Learning Approaches
10 stars 4 forks source link

Hough Development #4

Open ewengillies opened 9 years ago

ewengillies commented 9 years ago

I have uploaded a new notebook for examining how the hough transform works. Only using it on naked signal tracks. It works pretty well, but parameters need to be tune.

I think there is some truth information about location of hits on wires in actual space, not projected onto the endplate (i.e. x,y,z of physical hit). Maybe we can project that back to get some feel for radius and smear. Recall that they are actually helixing, so projections themselves might be pretty difficult to untangle. Any thoughts? Maybe new data will give us better truth. I'll ask Imperial tomorrow

Also, I tried the fit differently on even and odd layers. I only looked at one event, but already the average of the centres of the even and odd layer are not the same as the fit over all layers. Perhaps not unexpected, but interesting. It may imply we are recovering more information this way.

arogozhnikov commented 9 years ago

Idea on finding optimal parameters for n_phi_tracks, n_rho_tracks, smearing radius, exponent parameter in Hough transform: take original values 0s and 1s (latter for signal wires - so using only labels from data). get direct and inverse hough transform, subtract input of each wire on itself (= diag(Hough_inverse dot Hough_direct) .* original_values).

Use final value as predictions for each wire, look at ROC curve. Find parameters with maximal ROC AUC.

upd. That's wrong, since we are using nonlinearity in the transformation (exponent). Ok, then we have to 'hide', say, 10% of signal wires (put their values to zeros) and look at how well hough 'guesses' them.

Maybe there is faster way, I shall think more

ewengillies commented 9 years ago

this gets further complicated in the presence of multi-turn data. I should be able to use physics to figure out the signal radius.... I will try to do so today.

arogozhnikov commented 9 years ago

this gets further complicated in the presence of multi-turn data

The good thing is these samples will be taken into account automatically, since we are not trying to guess only one track, but using an exponent in hough space to give priority to those with high probability