cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
63 stars 266 forks source link

Irf maker and cut optimiser #2473

Open Tobychev opened 8 months ago

Tobychev commented 8 months ago

As we are planning to produce IRFs on events that were not used to select the GH cut, it makes sense to split the procedure generating IRFs into two tools. Because this splitting results in a large restructuring of the the previous PR on generating IRFs (#2315) that was basically working properly, I open a new PR to preserve the discussion on the old attempt and let its more or less functional code serve as easy reference.

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

15 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

9 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 2 months ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 1 month ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 1 month ago

Failed

Analysis Details

11 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 1 month ago

Failed

Analysis Details

11 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

kosack commented 1 month ago

I tried testing this and get the following error when running ctapipe-make-irf when using all default options (also for generating the cuts file):

ValueError: Valid range for background reco energy is 0.15 to 150., got [1.5000e-02 2.3773e-02 ... 9.4644e+01 1.5000e+02]
ValueError: Valid range for Edisp true energy is 0.15 to 150., got [1.5000e-02 1.8884e-02 ... 1.1915e+02 1.5000e+02]

From the error message, it looks like the low-energy range is off by a factor of 10, e.g. the bins go from (0.015, 150) instead of (0.15,150). Is that some unit error somewhere (GeV instead of TeV)?

maxnoe commented 1 month ago

Why don't we allow energies above 150 TeV? or is that value taken from the CORSIKA headers?

kosack commented 1 month ago

Why don't we allow energies above 150 TeV? or is that value taken from the CORSIKA headers?

It think that's just the current default (See below, I don't think they are read from anywhere). It's just that currently when using defaults, they don't seem to agree for the minimum value. I would expect the tool to work correctly with default values.

image
LukasBeiske commented 1 month ago

Why don't we allow energies above 150 TeV? or is that value taken from the CORSIKA headers?

It think that's just the current default (See below, I don't think they are read from anywhere). It's just that currently when using defaults, they don't seem to agree for the minimum value. I would expect the tool to work correctly with default values.

image

Yes, exactly. This are just the current default values. I'll take a look why it doesn't work.

LukasBeiske commented 1 month ago

I tried testing this and get the following error when running ctapipe-make-irf when using all default options (also for generating the cuts file):

ValueError: Valid range for background reco energy is 0.15 to 150., got [1.5000e-02 2.3773e-02 ... 9.4644e+01 1.5000e+02]
ValueError: Valid range for Edisp true energy is 0.15 to 150., got [1.5000e-02 1.8884e-02 ... 1.1915e+02 1.5000e+02]

From the error message, it looks like the low-energy range is off by a factor of 10, e.g. the bins go from (0.015, 150) instead of (0.15,150). Is that some unit error somewhere (GeV instead of TeV)?

I'm not able to reproduce this locally on my end. The default values for ctapipe.irf.binning.{True,Reco}EnergyBinsBase.{true,reco}_energy_min and ctapipe.irf.optimize.CutOptimizerBase.reco_energy_min also seem to be correct at 0.015 TeV, which are the only places where the default values for any energy ranges are defined.

ctao-dpps-sonarqube[bot] commented 1 month ago

Failed

Analysis Details

11 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

kosack commented 1 month ago

What I ran was:

ctapipe-optimize-event-selection  --log-level=INFO \
    --gamma-file=gamma_60deg_180deg_run10000___cta-prod5b-paranal_desert-2147m-Paranal-dark_cone10.alpha_test_applied.DL2.h5  \
    --proton-file=proton_60deg_180deg_run10000___cta-prod5b-paranal_desert-2147m-Paranal-dark.alpha_test_alpha_test_applied.DL2.h5 \
    --electron-file=electron_60deg_180deg_run1000___cta-prod5b-paranal_desert-2147m-Paranal-dark.alpha_test_applied.DL2.h5

ctapipe-make-irf  --log-level=INFO \
    --gamma-file=gamma_60deg_180deg_run10000___cta-prod5b-paranal_desert-2147m-Paranal-dark_cone10.alpha_test_applied.DL2.h5   \
    --proton-file=proton_60deg_180deg_run10000___cta-prod5b-paranal_desert-2147m-Paranal-dark.alpha_test_alpha_test_applied.DL2.h5 \
    --electron-file electron_60deg_180deg_run1000___cta-prod5b-paranal_desert-2147m-Paranal-dark.alpha_test_applied.DL2.h5 \
    --cuts Selection_Cuts.fits

The output is:

2024-06-14 17:35:29,052 ERROR [ctapipe.ctapipe-make-irf] (tool.run): Caught unexpected exception: Valid range for background reco energy is 0.15 to 150., got [1.5000e-02 2.3773e-02 ... 9.4644e+01 1.5000e+02]
Traceback (most recent call last):
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/core/tool.py", line 413, in run
    self.setup()
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/tools/make_irf.py", line 258, in setup
    check_e_bins(
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/irf/binning.py", line 24, in check_bins_in_range
    raise ValueError(
ValueError: Valid range for background reco energy is 0.15 to 150., got [1.5000e-02 2.3773e-02 ... 9.4644e+01 1.5000e+02]
2024-06-14 17:35:29,062 INFO [ctapipe.ctapipe-make-irf] (tool.write_provenance): Output:

So somewhere it is getting 0.15 as a lower energy bin. If I add --no-do-background, I get the same error, but for the edisp instead of the background.

Edit: seems to be in the valid energy stored in the Selection_Cuts.fits:

% showtable Selection_Cuts.fits --hdu VALID_ENERGY                                         
     energy_min         energy_max
        TeV                TeV
------------------- ------------------
0.14999999999999997 150.00000000000003

% showtable Selection_Cuts.fits --hdu GH_CUTS                                              
        low                 center                high                 cut
        TeV                  TeV                  TeV
-------------------- -------------------- -------------------- -------------------
0.014999999999999996 0.019386698943458347 0.023773397886916695  0.9754451779370514
0.023773397886916695 0.030725847179780198   0.0376782964726437  0.9754451779370514
  0.0376782964726437  0.04869718602783414  0.05971607558302459  0.9754451779370514
 0.05971607558302459  0.07717983862752678  0.09464360167202897  0.9754451779370514
 0.09464360167202897  0.12232180083601446  0.14999999999999997  0.9754451779370514
 0.14999999999999997  0.19386698943458353  0.23773397886916706  0.4092728996725426
 0.23773397886916706    0.307258471797802    0.376782964726437  0.7170864798156927

I looked through the code, and so far cannot find anywhere where a factor of 10 (or wrong units) gets applied in the valid energy range... It's a bit of a mystery

462
463             result_saver = OptimizationResultStore(precuts)
464  ->         result_saver.set_result(
465                 gh_cuts=gh_cuts,
466                 valid_energy=valid_energy,
467                 valid_offset=[self.min_fov_offset, self.max_fov_offset],
468                 clf_prefix=clf_prefix,
469                 theta_cuts=theta_cuts_opt if point_like else None,
(Pdb) valid_energy
[<Quantity 0.15 TeV>, <Quantity 150. TeV>]
(Pdb) self.reco_energy_min
<Quantity 0.015 TeV>

The value 0.15 TeV seems to come from the function _get_valid_energy_range(), so it's the minimum finite value of the sensitivity, which should be larger than the reco binning range. So I guess the question is why is there a check that they are the same? I would just guess you want to check if the reco range is larger or equal to the valid range or something.

LukasBeiske commented 1 month ago

The value 0.15 TeV seems to come from the function _get_valid_energy_range(), so it's the minimum finite value of the sensitivity, which should be larger than the reco binning range. So I guess the question is why is there a check that they are the same? I would just guess you want to check if the reco range is larger or equal to the valid range or something.

Thanks for tracking this down. As far as I understood it, the optimization of the G/H cut only gives a valid result if a finite sensitivity could be computed. So for bins where this was not possible, the calculated G/H cut is not valid. And this is the main idea behind this check in ctapipe-make-irf: To see if the provided G/H (and theta) cuts are valid for the whole energy range for which the IRF should be computed.

So to avoid this problem while testing this code, it might be better to not use this optimization for maximum sensitivity by default, but instead only do percentile cuts by default (ctapipe.irf.optimize.PercentileCuts), where the valid energy range is always the one defined during configuration. Right now, you could run ctapipe-make-irf with --IrfEventSelector.optimization_algorithm="PercentileCuts" and everything should work.

And it would probably also be a good idea to give a warning if the valid energy range differs from the configured energy range after the cut optimization...

kosack commented 1 month ago

Thanks for tracking this down. As far as I understood it, the optimization of the G/H cut only gives a valid result if a finite sensitivity could be computed. So for bins where this was not possible, the calculated G/H cut is not valid. And this is the main idea behind this check in ctapipe-make-irf: To see if the provided G/H (and theta) cuts are valid for the whole energy range for which the IRF should be computed.

I think the issues is maybe that the chosen binning shouldn't matter: if the requested bin range is larger than what can be computed, the IRFs should just have NaN (until we have a better way to store invalid masks) in the bins that are outside the valid energy range. Required the user to know in advance what range will be valid is less useful, particularly if we want to automate this for a large number of analyses.

It's the same as if we have low statistics that prevent an IRF from being computed, we would want to ensure that those bins are still stored, but marked properly as unusable.

LukasBeiske commented 1 month ago

I think the issues is maybe that the chosen binning shouldn't matter: if the requested bin range is larger than what can be computed, the IRFs should just have NaN (until we have a better way to store invalid masks) in the bins that are outside the valid energy range. Required the user to know in advance what range will be valid is less useful, particularly if we want to automate this for a large number of analyses.

It's the same as if we have low statistics that prevent an IRF from being computed, we would want to ensure that those bins are still stored, but marked properly as unusable.

Good point. I'm at the CTAO Summer School at the moment, but once I'm back I'll change it accordingly.

LukasBeiske commented 1 week ago

When looking at it again, I noticed that most of these bin-in-range checks did not make sense anyway. They compared the true energy bins of the irfs with the reco energy range for which the cuts are calculated. All the the fov offset checks also did not make sense, since we do not calculate the cuts in fov bins yet and instead we compared it with the fov offset range used for the background events in the sensitivity estimation in PointSourceSensitivityOptimizer.

For the remaining checks, I switched the default to only give a warning if the requested irf binning is outside the valid range of the cuts. The reco energy bins for the background which are actually outside the range of the cuts are currently filled with 0 in the corresponding pyirf functions. I'll open a PR there to instead have them be NaN. This is also the case for empty bins in other irfs.

ctao-dpps-sonarqube[bot] commented 1 week ago

Failed

Analysis Details

10 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

ctao-dpps-sonarqube[bot] commented 5 days ago

Failed

Analysis Details

11 Issues

Coverage and Duplications

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube