Closed orelgueta closed 2 months ago
Yes, and this is somehow related to #11. It might very well be that the performance is heavily influenced by the training statistics of each bin (and not only the different "physics" associated to the different energy bins).
I'm not really understanding why it is necessary to have an energy binning in the training, given that the energy is itself one the variables we're training the models with. Shouldn't it be possible for the model to take into account the effect of the energy when predicting the angular difference? Just like it does with the fov offset. We're not binning in fov offset for the training (because that would cost a lot of train statistics), we only bin in offset (and energy) when calculating the event type thresholds.
I'm testing this in the branch test_energy_binning. It looks like we can safely reduce the number of energy bins in the training, but I'm still not sure if we can reduce it to just one. The branch is still a little buggy, so I'll make the PR after holidays.
OK, no problem!
I tested with 1, 3, 5, 10 and 20 (current) equally filled energy bins with the test_energy_binning branch, which is now ready for the PR. I think the confusion matrix is the best way to compare between these, because even if the binning in the training is different, we can test the with the same binning (the one we use later to compute sensitivity).
I know they are too many plots and is difficult to see because they are all really similar. My conclusion is that the binning is not affecting too much, but I can see that the best performance is achieved when the number of bins is reduced (3 or 5), but not 1. So I would stop using 20 bins by default and probably reduce it to 3. What do you think?
I focused on the case of 3 event types for the following (because I hope it is the case that will actually be used).
It seems to me that the difference is only seen at the highest energy bins, above 50 TeV. For all the bins below I think the difference is small enough to be considered within statistical fluctuations. For those very-high-energy bins, we do benefit from the increase in energy bins in the training. This is expected considering the steep energy distribution. The NN will naturally focus on the more abundant low-energy examples it sees. The difference is not large though and probably not worth the increase in complication. One could also argue that above 50 TeV the differences we see are inconsequential considering the amount of events we have at those energies and their inherent good quality to begin with.
So, to make a long story short, I think it won't make a difference if we reduce the number of energy bins. The best compromise between performance and simplicity is if we use 10 bins, but going down to three wouldn't be too bad either. If you think using three bins is simpler/easier than 10, let's use that. Otherwise, perhaps using 10 bins is best.
Yes, you're right, the highest energy bins work better with 10 bins, so let's use that.
OK!
We can probably improve performance if we optimise the number and distribution of energy bins. Worth a try at least.