Update energy_dispersion_to_migration to account for fix in energy dispersion normalization

cta-observatory / pyirf

Python IRF builder

https://pyirf.readthedocs.io/en/stable/

MIT License

15 stars 25 forks source link

Update energy_dispersion_to_migration to account for fix in energy dispersion normalization #273

Open HealthyPear opened 9 months ago

HealthyPear commented 9 months ago

This attempts to update the function that produces the energy migration matrix from the energy dispersion.

After #250 this function was not updated which resulted in the wrong matrix when compared with the one computed starting directly from the events (see #272)

I think I got a working result, but it might still be wrong - also for reasons related to my use case I didn't test this using different binning, so I am re-sampling the same thing to comparing with the original matrix (left panel)

I should also update the unit test in order to compare the counts, but since it comes from a re-sampling I am not sure how much big the relative difference should be.

HealthyPear commented 9 months ago

For reference, this is the code I am using to plot the data,

def plot_migration_matrix(true_quantity, reco_quantity, data, quantity_name=None, ax=None, logx=True, logy=True, vmin=0):

    ax = plt.gca() if ax is None else ax

    X, Y = true_quantity, reco_quantity
    if len(data.shape)==3:
        Z = np.sum(data, axis=2).T
    else:
        Z = data.T

    matrix = ax.pcolormesh(X, Y, Z, vmin=vmin)
    fig=plt.gcf()
    fig.colorbar(matrix, ax=ax, label='Normalized number of events')

    if logx:
        ax.set_xscale("log")
    if logy:
        ax.set_yscale("log")
    quantity_name = "" if quantity_name is None else quantity_name
    ax.set_xlabel(f"True {quantity_name} [ {true_quantity.unit} ]")
    ax.set_ylabel(f"Reco {quantity_name} [ {reco_quantity.unit} ]")

    return ax

calling it like this,

plot_migration_matrix(true_energy_bins, reco_energy_bins, migration_matrix, quantity_name="energy")

HealthyPear commented 9 months ago

I think the tests are failing due to #270.

Note that gammapy does not support yet astropy 6 (see https://github.com/gammapy/gammapy/issues/4972), so it might be sensible to use an upper limit here for the time being.

maxnoe commented 9 months ago

Tests are fixed in main, please rebase

HealthyPear commented 9 months ago

I noticed we don't have a function that allows to create the energy migration matrix directly from the events.

I think it would be nice to have it also as a crosscheck for the tests.

HealthyPear commented 9 months ago

I added a function to compute the migration matrix directly from the events with its unit test.

Now the final touch would be to test the two functions against each other, but I am afraid that the resampling makes this complex in terms of the tolerance required.

codecov[bot] commented 9 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (5b239f2) 95.36% compared to head (a3224e3) 95.40%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #273 +/- ## ========================================== + Coverage 95.36% 95.40% +0.03% ========================================== Files 60 60 Lines 3109 3131 +22 ========================================== + Hits 2965 2987 +22 Misses 144 144 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kosack commented 9 months ago

Are the differences in the top two figures understood? It seems there are some non-zero bins in the lower-right that disappear and ones that appear in the upper right side.

Also shouldn't the integral along the x-axis in this case be 1.0, not the peak? It's a probability distribution.

One way to compare in a test would be to compute the percent difference in all bins , e.g. (new-old)/old and check that no bins are more than some percentage. The overall systematics allowed on the energy scale is I think 10%, but the limit here should be quite a bit lower as it's only part of the contribution

HealthyPear commented 9 months ago

Are the differences in the top two figures understood?

Please don't worry about these figures - this is SWGO data and I still cannot make this function work with that dataset for some reason. I am reverting to compute the energy migration matrix directly from the events with just one bin in FoV offset because that shows the correct matrix for me.

Together with Max, we ran it on some dummy data with ideal distributions and these issues do not arise in that case.

edisp_to_matrix.ipynb.zip

maxnoe commented 9 months ago

Also shouldn't the integral along the x-axis in this case be 1.0, not the peak? It's a probability distribution.

For the migration matrix, the sum should be one, you don't want a PDF but a transfer matrix.