Open TarekHC opened 2 years ago
I just realised I didn't update this. We understood where nans are coming from, but this issue is not completely fixed.
Most of the nans were appearing due to a bad indexing. We were exporting multiple ETs from different events into the same index of the table, overwriting them. This problem was fixed before PR #43 , but there are still a bunch of events that don't get an ET assigned when exporting event types.
For this reason we added 2 new negative event types, explained in the commented lines 167-175 of export_event_types.py:
# Types 1, 2 and 3 are actual reconstructed event types, where 1 is the best type and
# all three should have similar statistics.
# Type -1 is assigned to the train sample.
# Type -2 is the default value when assigning event types to the test sample in
# partition_event_types. As that function works for each bin of energy and offset,
# events outside these bins will have type -2.
# Type -3 is the default value for the complete tables of each particle.
# i.e. events with no other type assigned.
I've done a test of export_event_types.py using the model MLP_tanh with 25% of the diffuse gamma data (all offsets) for training and 75% of diffuse gamma data and all electron and proton data for testing. The results were the following:
GAMMAS: A total of 5142207 events will be written. A total of 5 events of type -3 A total of 1 events of type -3 for gamma-like events A total of 65683 events of type -2 A total of 1098 events of type -2 for gamma-like events A total of 1285544 events of type -1 A total of 560349 events of type -1 for gamma-like events A total of 1263660 events of type 1 A total of 815645 events of type 1 for gamma-like events A total of 1263655 events of type 2 A total of 586407 events of type 2 for gamma-like events A total of 1263660 events of type 3 A total of 277727 events of type 3 for gamma-like events
ELECTRONS: A total of 6061074 events will be written. A total of 6 events of type -3 A total of 0 events of type -3 for gamma-like events A total of 57042 events of type -2 A total of 2202 events of type -2 for gamma-like events A total of 0 events of type -1 A total of 0 events of type -1 for gamma-like events A total of 1448283 events of type 1 A total of 665732 events of type 1 for gamma-like events A total of 1744381 events of type 2 A total of 594410 events of type 2 for gamma-like events A total of 2811362 events of type 3 A total of 708512 events of type 3 for gamma-like events
PROTONS: A total of 46555026 events will be written. A total of 2430 events of type -3 A total of 1 events of type -3 for gamma-like events A total of 2571252 events of type -2 A total of 5962 events of type -2 for gamma-like events A total of 0 events of type -1 A total of 0 events of type -1 for gamma-like events A total of 2525498 events of type 1 A total of 113997 events of type 1 for gamma-like events A total of 5115250 events of type 2 A total of 135855 events of type 2 for gamma-like events A total of 36340596 events of type 3 A total of 256388 events of type 3 for gamma-like events
We can see multiple effects that I cannot explain yet, mostly related to uneven statistics in all the partitions, which we will need to investigate further. But for this issue, the main problem is that there are still some events of ET -2 and -3, and this number increases a lot in the case of the protons. -2 type is most probably related to events with an energy or offset liying outside all the possible bins, so this should be easy to fix, but I have no clue why type -3 still exists.
Thank you @JBernete for the nice explanation.
Let me give you my point of view of events in -2: gammas and electrons are roughly in the 1% level of "-2" events, while protons are on the 5% level.
I would bet that this difference makes sense: proton showers are much more likely to be detected with very large offsets (a pion sub-shower may be produced with much larger offsets than within electromagnetic showers). This means that protons are more likely to fall outside of our limiting max offset, and therefore having a -2.
Let's try to understand where are nans coming from.