Test if score breaks after adding more offsets

TarekHC commented 2 years ago

Repeat training with several algorithms and all offsets, but use as test sample only the same test sample we have been using until now: those events close to the center of the field of view.

That way we check if we can do a single training for all offsets, and be sure performance at a given offset is not harmed. If performance breaks, then we will need to do offset-dependent training (1 deg bins?).

orelgueta commented 2 years ago

Good news @TarekHC, @JBernete. Not only do we not break the performance, we actually get a better performance when training with all offset angles but applying only to the events in the centre of the camera (< 1 deg). See the plot of the scores below, where red is trained and applied on all offset bins, blue is trained with and applied on events in the centre of the camera (< 1 deg), and yellow was trained with all offset angles but applied only on events in the centre of the camera (same events as the blue curve).

scores_features_1

I could convince myself that it makes sense. Events in different offset angles are not that different from each other and having more events to train with improves performance, even of those events are in a different offset. What do you think?

TarekHC commented 2 years ago

Wow, this looks fantastic! (and I'm sick of training the ERC presentation...).

Could you please test repeating this plot by testing several % of training statistics? e.g. 20%, 35% and 50% or something like that.

That would help us understand if the improvement is clearly coming from the increased training statistics. If we see a similar improvement over the 3 train/test samples, then we would need to investigate things a bit farther.

orelgueta commented 2 years ago

Yes, this is what I planned to do next. Hopefully I will manage to work on it soon (expect the rest of the week to be busy).

orelgueta commented 2 years ago

Finally got a chance to run this. The plot below shows the score as a function of energy for increasing fraction of training sample using only the inner 1 deg of the camera.

scores_features_1

We can see that indeed we can gain performance with a larger sample as predicted from the plot above which uses events from the entire camera.

Putting the curves from the plot above into this plot for comparison is a bit confusing. The energy bins are not exactly the same so I am not sure if the scores are exactly comparable. If they are, then at energies below 100 GeV we actually lose performance (compare the MLP_tanh_inner_offset_bin curve with the train_size_25p one). I don't think it is very clear though or would worry about it too much. What's interesting is that with 75% of the events in the training sample we reach basically the performance over the entire camera except above 1 TeV (yellow and red curves). The perhaps point to the fact that if we still had more events to train with we could improve performance.

scores_comparison_1

Either way, I think we could say that we can continue training with the full camera and not worry about breaking performance at the centre.

TarekHC commented 2 years ago

Hi Orel,

Just for me to make sure I understand the new results:

In the first plot, when you say "using only the inner 1 deg of the camera" is using it both for the training and score calculation, right? So the 100% statistics would be the central 1 deg of the full gamma-diffuse sample (no events beyond 1 deg).
In the second plot, all scores are calculated only for the first 1-deg offset bin, but with differences in the training statistics (sometimes including events beyond the central 1-deg, that do not enter into the score calculation).

Are these statements correct?

orelgueta commented 2 years ago

Hi Tarek,

In the first plot, when you say "using only the inner 1 deg of the camera" is using it both for the training and score calculation, right? So the 100% statistics would be the central 1 deg of the full gamma-diffuse sample (no events beyond 1 deg).

There are two curves in the first plot which are "using only the inner 1 deg of the camera":

The blue one, "MLP_tanh_trained_inner_offset_bin", for which the events in the inner 1 deg were used both for training and for evaluation of the score (with the usual fractions of 25% and 75% respectively).
The yellow one, "MLP_tanh_inner_offset_bin", for which events all over the camera are used in the training but for the score evaluation only the inner 1 deg of the camera is used.

Is that clearer?

In the second plot, all scores are calculated only for the first 1-deg offset bin, but with differences in the training statistics (sometimes including events beyond the central 1-deg, that do not enter into the score calculation).

Which second plot are you referring to? The one only with "trainsize*p" labels? In that case your statement is almost correct. The only difference between the samples is the training statistics, but we never include events beyond the central 1 deg of the camera (cause that is essentially what is done in the first plot so I didn't see a reason to repeat this.

If you are referring to the third plot, then there it's a combination of the first two. I hope that with the clarifications above you can understand which samples are used for each curve.

JBernete commented 2 years ago

I've made this test for the score, not only for the inner offset bin, but for all the rings 1 deg wide up to 4 deg. (Code in PR #52) These are the results for 25% train:

From 0 to 1 deg: scores_features_1_ring1

From 1 to 2 deg: scores_features_1_ring2

From 2 to 3 deg: scores_features_1_ring3

From 3 to 4 deg: scores_features_1_ring4

JBernete commented 2 years ago

The latter offset range shows a valley shape that @TarekHC and I can't explain yet.

orelgueta commented 2 years ago

It could be due to the switch from LSTs to MSTs. Perhaps at low enough energies, the image is still contained within the LST FoV, despite being at large offset. As you go up you start losing too much of the image (and even cutting out the LSTs entirely). Once the energy raises you start collecting more MSTs and the LSTs are not important anymore.

orelgueta commented 2 years ago

BTW, it would be good to put these plots all in one figure. You can probably modify plot_scores.py a bit to do it with a nice legend.

JBernete commented 2 years ago

My plan now is to plot different train statistics for each ring, so I wouldn't mix different rings in one plot. But yes, that can be done too.

JBernete commented 2 years ago

Here are the plots with different train statistics: scores_features_1_ring1 (1) scores_features_1_ring2 (1) scores_features_1_ring3 (1) scores_features_1_ring4 (1)

The conclussion is the same: 25% is a good choice for the train sample.

orelgueta commented 2 years ago

Very nice!

TarekHC commented 2 years ago

Guys, I feel we can close this issue (I like closing issues!!). It really looks like, irrespective of what is the performance, things do not break up (they may even improve).

Just re-open if you feel there is anything missing that should be tested.

cta-observatory / iact_event_types

Test if score breaks after adding more offsets #40