cta-observatory / protopipe

Prototype data analysis pipeline for the Cherenkov Telescope Array Observatory
https://protopipe.readthedocs.io/en/latest/
Other
5 stars 13 forks source link

Reconstructed energy is written in logscale while true energy in linear scale #139

Closed HealthyPear closed 3 years ago

HealthyPear commented 3 years ago

This is because we write to file the direct estimation from the energy regressor which is decided by the target value in the model, log10_true_energy by default.

This creates 2 problems:

log10_reco_energy: reco_energy # Averaged-estimated energy of the shower

which is horrible

kosack commented 3 years ago

I think the solution is to allow a transformation to normalize/re-scale the predicted variable. E.g. the predicted value should always be "energy" (not log10_energy), but you should have an option

transform: np.log10
inverse_transform: lambda p: 10**p

And then during training you call the transform so all computations are in log10_energy, and after predict call the inverse transform to go back to energy. You could also include the scaling there to TeV, unless that is just assumed that energies are in TeV.

So the sequence of steps is:

The same could even be used for input data to the training, if you really want to be general. I.e. you could allow a column name + transform + inverse_transform for all variables (e.g. intensity → log10(intensity) → training) However, I guess the user-defined features solve that problem, so it's probably only needed for the input/output parameter

HealthyPear commented 3 years ago

Just a small clarification: I found this problem only now because in the previous AdaBoost config the true target was true_energy and not log10_true_energy so the estimated value was always in linear scale (not sure if this was also one of the factors for which resolution was bad before)