Open moralejo opened 2 years ago
List of training grid points (cone, impact and E-ranges are for gammas - see #3 for the corresponding proton values): MC_training_grid.txt
Hi @Voutsi @moralejo I'm not sure if this is known issue, but I realized that Zd=37.661 Az=270.641 node was generated as such for gamma rays, but for protons Az is slightly different: Az=270.661 (at least according to the file names). Looks like a typo in running jobs. The difference in the azimuth is tiny, so the actual effect on the shower is negligible, but I think this can cause a very nasty feature in the RF: if you use pointing azimuth angle in the RF with this node inside you will get "perfect" separation between gammas and protons with a cut at 270.65, such a cut would terminate the tree splitting immediately making the RF very bad around zd/az angle of this node
EDIT: I found another case, 9.579 233.112 was simulated as theta_9.579_az_233.12 for gammas and 233.112 for protons
@jsitarek Nice catch ! For the record, in lstmcpipe, we filter pointings to make sure they both exist for gammas and protons, so these ones have not been included in the trainings I think.
Hi, thanks for the feedback.
EDIT: I found another case, 9.579 233.112 was simulated as theta_9.579_az_233.12 for gammas and 233.112 for protons
That's simply a typo in the naming, I will rename them.
Zd=37.661 Az=270.641
here indeed there is a difference between protons and gammas. I will reproduce the node. Sorry about that.
Hi @Voutsi @moralejo I'm not sure if this is known issue, but I realized that Zd=37.661 Az=270.641 node was generated as such for gamma rays, but for protons Az is slightly different: Az=270.661 (at least according to the file names). Looks like a typo in running jobs. The difference in the azimuth is tiny, so the actual effect on the shower is negligible, but I think this can cause a very nasty feature in the RF: if you use pointing azimuth angle in the RF with this node inside you will get "perfect" separation between gammas and protons with a cut at 270.65, such a cut would terminate the tree splitting immediately making the RF very bad around zd/az angle of this node
Ouch! Thanks @jsitarek. Dangerous indeed, and it is in the Crab line...
@voutsi: Can you check this exhaustively for the other nodes? I recall seeing a plot of RF feature importance (from @SeiyaNozaki I think) where surprisingly az_tel was a very relevant parameter. I think this was for the declination 34.76 deg - but I am not sure. Could this be another instance of the same?
It seems to me a bit of overkill to re-produce the events. The difference is completely negligible. Rounding azimuth to 0.1 deg (e.g. in DL1 files) would solve the issue.
Let me propose that for coming productions, we already round the grid points themselves to 1 or 0 decimal digits
For the record, I cannot see any effect at all in our standard Crab sample (which reaches ~34.6 deg in zenith). So I don't see much need for a quick & dirty fix I was proposing above.
@voutsi How much would it take to re-generate the protons?
Hi
Can you check this exhaustively for the other nodes?
I checked declination band 34.76, consistency between protons/gammas and corsika/simtel and it looks ok. (there was a problem in node zd=7, both azimuths, where the simtel files where saved in dir output instead of output_v1.4, but that has been corrected and I think shouldn't create the issue you mentioned)
I will now check, just to be sure, whether the angles in the name of the files correspond to the actual one used in sim. The way the naming works, I don't expect to see a mismatch there.
I will go through all other bands as well. I would be surprised to find such problems in the newer bands, I automatised much more the procedure in order to avoid exactly these problems (since the first issue due to a typo in the same Crab node at zd = 37)
How much would it take to re-generate the protons?
I submitted the jobs this morning. Since the cluster is used quite heavily these days, I would expect to be ready by tomorrow, evening probably)
Ok, thanks, that is fast indeed - I had in mind the production times for a whole line, but this is just a node.
Let me propose that for coming productions, we already round the grid points themselves to 1 or 0 decimal digits
For Azimuth 0 decimal points are probably fine, but for zenith it would be good to keep 1 digit (due to high zenith values)
Regardless of the grid of directions we design for the MC test set, i.e. for the calculation of IRFs, the effect of the analysis (cut efficiencies, energy and angular resolutions) has to vary smoothly across grid points for the IRF interpolation to make sense.
In the standard Random Forest approach, the simplest way of achieving this seems to be to include the telescope pointing (zenith, az) among the RF input parameters. However, training using an all-sky MC library (e.g. produced "isotropically" up to, say, a zenith of 70 degrees) seems technically challenging - we may be forced to limit the training statistics, and this might impact performance.
An alternative approach is to create separate training samples (and hence RFs) in declination bands, since the analysis of a given sky patch would just use one such set of RFs, and the smoothness along the source path could be achieved. Then, azimuth and zenith would be linked, and in principle we could just use azimuth as an input to the RFs (though perhaps, since the declinations of data and MC will not match perfectly, it is better to keep both - consider e.g. the different wobble pointings).
Below is an example grid with 30 different declinations (from -29 to +67, distributed in steps of cos(zenith) from a minimum culmination zenith angle of 6 degrees). En each band 20 points at equal times are produced, with the limits being determined as: zenith less than 70 degree and +/- 6 hours around culmination. The color map indicates the value of sin(delta), i.e. the projection of the magnetic field on the plane orthogonal to the observation direction.
This approach would also have the advantage that it is easy to prioritize the production for specific sources, like Crab and the Galactic Center, which would be very useful for the validation of the whole IRF interpolation procedure.
(update 20220204: reduced the number of declination lines to 10 towards south and 5 towards north)
(update 20220321: changed the plot and the list of points, the azimuths were not consistent with the already-produced MC for dec=22.76 deg)