ewengillies / track-finding-yandex

COMET Tracking : Machine Learning Approaches
10 stars 4 forks source link

Current problems #3

Open arogozhnikov opened 9 years ago

arogozhnikov commented 9 years ago
  1. feature engineering for local filtering
  2. generating background data
  3. visualization of probabilities to see how local filtering works, visualization of hough working of results of local filtering.

Forgot something?

arogozhnikov commented 9 years ago

Few more.

  1. Use an adequate metric for recognition on wires. This is roughly binned median significance: $\sum_{i \in bins} s_i^2 / b_i$. Though it is not precise, it's the metric we can construct for now, which is much more stable than fpr/tpr/cuts, specially if using sklearn.IsotonicRegression to estimate s/b quantity. [TODO discuss]
  2. For online filtering: we can construct decision rule for wire as analytical formula and optimize it's parameters (this can be done using hep_ml).
arogozhnikov commented 9 years ago

Ok, one more idea based on what you said about 'longest sequence' within a row for online triggering.

It's three-stage recognition, very fast

  1. based on deposits from left and right neighbors (and self) + timings, trying to predict if wire is signal
  2. collect predictions for each row, getting 20 numbers
    Based on the same idea of neighbors (but for layers), we are predictng the actual number of signal wires within a layer. Seems that distributions over layers is vey different for signal and background. (Note: here we'll need background events)
  3. sum predictions for layers, put a threshold on result.

Anyway, we hardly can directly use information about interaction of wires from different layers due to significant impredictable rotations, so this seems a fine intermediate step.

ewengillies commented 9 years ago

yeah agreed. I am presenting tomorrow to my supervisors, so focusing on offline until the weekend. I am thinking that "sig_like_neighbours" can preform better.

Can we train all the wires just one non-neighbour features (time, energy deposit, layer id), then use this output to define if a wire is signal like, but only when its considered as a neighbour. Does this make sense?

Also, maybe we can try training in neighbours defined via even layers vs. odd layers? I.E. vertical neighbours of layer 3 is layer 1 and layer 2...

arogozhnikov commented 9 years ago

Also, maybe we can try training in neighbours defined via even layers vs. odd layers? I.E. vertical neighbours of layer 3 is layer 1 and layer 2...

Shifts are too huge - that's the problem. Hope, you mean layer1 and layer 5 (not 2)?

One more thing why I don't want significally rely on far points - this information we will take via hough transform.

ewengillies commented 9 years ago

Yes, I did mean layer 1 and 5, point taken about hough transform. what about:

Can we train all the wires just one non-neighbour features (time, energy deposit, layer id), then use this output to define if a wire is signal like, but only when its considered as a neighbour. Does this make sense?

arogozhnikov commented 9 years ago

This makes sense, and is very good model of first order approximation. Something like 'baseline' model.

However, time by itself is senseless, for instance. So, you'll lose this information in the model. All information that can be take from pairwise feature interaction will be lost.

ewengillies commented 9 years ago

sorry, I should clarify: time as a variable = time of hit - time of trigger. lets call it relative_time from now on (this will be reflected in my next push). This variable is much earlier for signal hits, and flat for background.

the only place the output of this 'baseline' model would be used is in the current 'signal like neighbours' which under performs (if you ask me). check the latest LocalBasedFiltering, it shows the peroformance of the features.

arogozhnikov commented 9 years ago

together with other things this should be fine. Better use undertrained classifier there (n_trees=10, min_samples_leaf=100)