This issue is part of a project described in issue #24.
The following is a "real-time" list of points that are found to be differences between the pipelines using the comparison.
Not all features are critical to recovering the missing performance, but all should be implemented (as more similar as possible) in order to allow their optional use when comparing different algorithms.
[x] Add missing features
The only one missing should be
-Concentration (ratio of the highest charge contained in two contiguous pixels and Intensity)
which as the same definition problem as in #92
[x] Modify (better: allow for configurable) weight (#125)
This issue is in common with #92.
Right now we weigh with Intensity, while CTAMARS uses Intensity^0.54.
[x] Modify usage/training (configurable, to check)
CTAMARS uses the whole gamma-2 and proton-1 samples to train the classification model, whereas protopipe splits the original TRAINING data into train/test sub-samples.
This allows applying intermediate benchmarking before applying the models to the rest of the analysis data sample (DL2 production takes more time and it could be convenient to make studies on the models without producing every time DL2 data).
This issue is part of a project described in issue #24.
The following is a "real-time" list of points that are found to be differences between the pipelines using the comparison. Not all features are critical to recovering the missing performance, but all should be implemented (as more similar as possible) in order to allow their optional use when comparing different algorithms.
The only one missing should be -Concentration (ratio of the highest charge contained in two contiguous pixels and Intensity) which as the same definition problem as in #92
This issue is in common with #92. Right now we weigh with
Intensity
, while CTAMARS usesIntensity^0.54
.CTAMARS uses the whole gamma-2 and proton-1 samples to train the classification model, whereas protopipe splits the original TRAINING data into train/test sub-samples. This allows applying intermediate benchmarking before applying the models to the rest of the analysis data sample (DL2 production takes more time and it could be convenient to make studies on the models without producing every time DL2 data).