Test scripts with SED fit with gammapy-0.19

cosimoNigro commented 2 years ago

A new release of gammapy will be announced tomorrow. We should try to update the scripts with the SED fit to this latest version.

With 0.1.8.2 there is actually a conflict complicating the reproducibility: gammapy 0.18.2 works with iminuit < 2 while jetset works with higher versions. Not that we are using any fitting routine of jetset, but installing it will automatically install a newest version of iminuit. The conflict should be resolved by gammapy 0.19 that upgraded its iminuit dependency to the latest version.

davidsanchez commented 2 years ago

Comments after the call of Friday 26. As I said, gammapy FluxPoint do not accept entries with the same energy now. For this, my first try was to remove the duplicated entries with

sed_table = Table.read(sed_path)
sed_table.sort(keys="e_ref")
_,mask = np.unique(sed_table["e_ref"],return_index=True)
sed_table = sed_table[mask.tolist()]

I reran the sherpa fit for PKS 1510-089, removing the duplicated entries as mentioned above. The first plot is the result with gamampy the second with sherpa. Both are different from the paper but more important, I do not have the same results :(

More to come soon

figure_6_gammapy_fit

figure_6_sherpa_fit

cosimoNigro commented 2 years ago

Thanks a lot @davidsanchez, I wrote in the Gammapy slack to get some support on this, I CC you. As I suggested, if we cannot sort it out we can fall back to 0.18.2 though it would be better to submit the scripts with the latest version of Gammapy.

davidsanchez commented 2 years ago

@cosimoNigro . I saw your message. thanks. I reply to it. I found that using the sherpa script and only sorting the data by increasing energy gives the strange results reported above. I am looking into that

adonath commented 2 years ago

Just for there record: the new recommended way to handle multiple flux points in Gammapy is to just keep them separate with multiple FluxPointsDataset objects in the analysis. I have described the motivation for this in detail here https://github.com/gammapy/gammapy/issues/3667#issuecomment-981705775.

We found that updating to iminuit=2.8.4 In Gammapy produced equivalent results in our validation, so I would not expect any significant changes here either.

cosimoNigro commented 2 years ago

Thanks a lot for commenting here Axel, I get the advantages of the new approach. The only problem left is that for one of the sources I got a single txt file with the SED data mixed and sorted in energy, so it is impossible for me to sort them back to instrument-wise tables.

The only solution is to contact the author of the paper and hope he still has the data separated by instrument.

Otherwise the only workaround is to do a re-binning of the flux points, with the flux in each bin being the weighted average of all the points.

cosimoNigro commented 2 years ago

Alternatively we can also just release the agnpy paper with gammapy-0.18.2, we have a conda environment and a docker container.

The problem has to be fixed though in the notebooks in the docs.

cosimoNigro commented 2 years ago

Ok I wrote to David Paneque to see if he still has the Mrk421 measurements separated by instruments.

@jsitarek would you have the PKS1510-089 2015 data with the flux measurements by different instruments distinguishable?

jsitarek commented 2 years ago

Hi, in my notes I have them all put together, but I will investigate if I can recover the original ones (the problem seems to be two spectra from UVOT). Alternatively one could split them just into higher and lower values. However I found two problems in https://github.com/cosimoNigro/agnpy/blob/master/agnpy/data/mwl_seds/PKS1510-089_2015b.ecsv 1) the units of SED are written to be erg cm-2 s-1 while the file that I have with those data says TeV cm-2 s-1 2) one of the double points is missing, you have there only 1.01705 2.10364e-11 9.68761e-14 9.68761e-14 while I have two: 1.01705 2.10364e-11 9.68761e-14 9.68761e-14 1.01705 1.66024e-11 1.52914e-14 1.52914e-14

jsitarek commented 2 years ago

Hi, I was able to dig out the original script and get the data, I was wrong, it was not UVOT that was doubled, but some optical data, and they are mostly coming from the same instrument but different pointings, in particular TCS is messy, because I have 6 points in one night and 3 different wavelenghts, but they are not in two series, but kind of mixed. I marked each of them separately, but honestly one could simply divide them randomly in two groups just to bypass the gammapy issue. Here is the divided list (note the units issue mentioned above, I also included the missing point)

POINTS FOR MJD=57164-57166 E[eV], flux [TeV cm^-2 s^-1], dflux_low [TeV cm^-2 s^-1], dflux_high [TeV cm^-2 s^-1] Metsahovi: 0.000153032 6.99043e-13 3.2562e-14 3.2562e-14 NICS 1: 1.01705 2.10364e-11 9.68761e-14 9.68761e-14 0.761227 2.11454e-11 1.86966e-12 1.86966e-12 NICS 2: 1.01705 1.66024e-11 1.52914e-14 1.52914e-14 0.761227 1.88459e-11 3.47154e-14 3.47154e-14 0.566575 1.77191e-11 4.73277e-13 4.73277e-13 SMART 1: 2.87689 1.57645e-11 1.0131e-13 1.01966e-13 2.24627 1.5805e-11 1.01571e-13 1.02228e-13 1.90175 1.68016e-11 1.54038e-13 1.55464e-13 0.993543 1.76775e-11 1.62068e-13 1.63567e-13 0.566183 3.06088e-11 2.52677e-13 2.5478e-13 TCS 1: 1.01746 1.81106e-11 4.33206e-13 4.33206e-13 TCS 2: 1.01746 1.79611e-11 6.11396e-13 6.11396e-13 TCS 3: 0.761024 1.69365e-11 5.60938e-13 5.60938e-13 TCS 4: 0.761024 1.80811e-11 6.82019e-13 6.82019e-13 TCS 5: 0.566632 1.86909e-11 8.59783e-13 8.59783e-13 TCS 6: 0.566632 1.86222e-11 5.48237e-13 5.48237e-13 KVA 1: 1.93565 1.10888e-11 1.73611e-13 1.73611e-13 KVA 2: 1.93565 1.54914e-11 2.28269e-13 2.28269e-13 UVOT: 2.28719 1.65687e-11 5.25596e-13 5.82262e-13 2.85176 1.7918e-11 4.07864e-13 3.75194e-13 3.57785 1.8505e-11 4.62817e-13 4.31037e-13 4.79814 1.70329e-11 3.87717e-13 3.96748e-13 5.51957 1.67606e-11 4.19189e-13 3.90405e-13 6.02623 1.55342e-11 3.53602e-13 3.61839e-13 XRay: 435 1.76867e-12 5.6252e-13 5.6252e-13 680 2.23289e-12 3.91068e-13 3.91068e-13 885 2.10229e-12 2.68017e-13 2.68017e-13 1095 1.77311e-12 2.2218e-13 2.2218e-13 1320 2.13266e-12 2.31034e-13 2.31034e-13 1600 2.07225e-12 2.61614e-13 2.61614e-13 1995 2.80442e-12 3.70419e-13 3.70419e-13 2665 2.92998e-12 5.29552e-13 5.29552e-13 3925 3.00231e-12 6.57167e-13 6.57167e-13 Fermi: 1.36673e+08 3.38927e-10 5.28275e-11 5.28275e-11 3.04365e+08 4.30519e-10 6.1813e-11 6.1813e-11 6.77807e+08 4.28777e-10 7.49836e-11 7.49836e-11 1.50945e+09 1.54187e-10 6.24507e-11 6.24507e-11 3.36148e+09 1.90394e-10 9.50969e-11 9.50969e-11 7.48588e+09 2.13595e-10 1.50767e-10 1.50767e-10 MAGIC: 8.86497e+10 3.28934e-11 1.36366e-11 1.36366e-11 1.23079e+11 7.89316e-12 3.39887e-12 3.39887e-12 1.70854e+11 6.76786e-12 2.34705e-12 2.34705e-12

cosimoNigro commented 2 years ago

@davidsanchez if you open a draft PR with your modifications I can devote some time to help you.

davidsanchez commented 2 years ago

@cosimoNigro I can do that indeed with the code I have. I will have time to work on this tonight

adonath commented 2 years ago

I can see the change in Gammapy creates a bit of inconvenience on the I/O side and data bookkeeping here. I think a good solution would be to make use of Astropy's table grouping. So you could keep all the flux points in a single file, but add an "instrument" or "tag" column by hand. Then in the code you can use something along the lines of:

datasets = Datasets()
table = Table.read("flux-points-all.ecsv")
table = table.group_by("instrument")

for group in table.groups:
    data = FluxPoints.from_table(group, sed_type="e2dnde", format="gadf-sed")
    dataset = FluxPointsDataset(data=data, name=group["instrument"][0])
    datasets.append(dataset)

datasets.models = SkyModel()

davidsanchez commented 2 years ago

Hi Using gammapy-18.2 and sorting the flux_points wrt energy I get the same results as with gammapy-0.19. This seems to me that sorting the data change the results and this should not be the case.

adonath commented 2 years ago

@davidsanchez But removing "duplicated" entries should change the result, no? As far as I can see there are some energies with multiple measurements available. Removing those should definitely change the result, as it affects the likelihood that is computed.

davidsanchez commented 2 years ago

@adonath indeed. I the thread I mentioned that I am now sorting the data only and shifting the energy by epsilon. Sorry this was not clear enough;

sed_table = Table.read(sed_path)

mask = np.zeros_like(sed_table["e_ref"], dtype=bool)
mask[np.unique(sed_table["e_ref"], return_index=True)[1]] = True
mask = ~mask
sed_table["e_ref"][mask.tolist()] +=0.0000001 
sed_table.sort(keys="e_ref")# sorting data wrt e_ref

Also, using sherpa and only sorting the data by energy, I get a different results. This not related to gammapy then

cosimoNigro commented 2 years ago

Hey @adonath, thanks a lot for helping us. Your approach of defining a new column with instrument looks nice and the easiest solution. We have problems though when Tables with a single entry are created...

Here a minimal example, a dummy_table.ecsv

# %ECSV 0.9
# ---
# datatype:
# - {name: e_ref, unit: eV, datatype: float64}
# - {name: e2dnde, unit: erg / (cm2 s), datatype: float64}
# - {name: e2dnde_errn, unit: erg / (cm2 s), datatype: float64}
# - {name: e2dnde_errp, unit: erg / (cm2 s), datatype: float64}
# - {name: instrument, datatype: string}
e_ref e2dnde e2dnde_errn e2dnde_errp instrument
0.000153032 6.99043e-13 3.2562e-14 3.2562e-14 Metsahovi

This is just one of the examples of the single-valued Tables that will be produced when we group by instruments.

from astropy.table import Table
from gammapy.estimators import FluxPoints

table = Table.read("dummy_table.ecsv")
data = FluxPoints.from_table(table, sed_type="e2dnde", format="gadf-sed")

this returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-08c72ab20cf7> in <module>
----> 1 data = FluxPoints.from_table(table, sed_type="e2dnde", format="gadf-sed")

~/software/miniconda3/lib/python3.8/site-packages/gammapy/estimators/points/core.py in from_table(cls, table, sed_type, format, reference_model, gti)
    231         for name in cls.all_quantities(sed_type=sed_type):
    232             if name in table.colnames:
--> 233                 maps[name] = RegionNDMap.from_table(
    234                     table=table, colname=name, format=format
    235                 )

~/software/miniconda3/lib/python3.8/site-packages/gammapy/maps/region/ndmap.py in from_table(cls, table, format, colname)
    515                 raise ValueError("Column name required")
    516 
--> 517             axes = MapAxes.from_table(table=table, format=format)
    518 
    519             if colname == "stat_scan":

~/software/miniconda3/lib/python3.8/site-packages/gammapy/maps/axes.py in from_table(cls, table, format)
   1899             for axis_format in ["gadf-sed-norm", "gadf-sed-energy", "gadf-sed-counts"]:
   1900                 try:
-> 1901                     axis = MapAxis.from_table(table=table, format=axis_format)
   1902                 except KeyError:
   1903                     continue

~/software/miniconda3/lib/python3.8/site-packages/gammapy/maps/axes.py in from_table(cls, table, format, idx, column_prefix)
   1221             elif "e_ref" in table.colnames:
   1222                 e_ref = flat_if_equal(table["e_ref"].quantity)
-> 1223                 axis = MapAxis.from_nodes(e_ref, name="energy", interp="log")
   1224             else:
   1225                 raise ValueError(

~/software/miniconda3/lib/python3.8/site-packages/gammapy/maps/axes.py in from_nodes(cls, nodes, **kwargs)
    513             raise ValueError("Nodes array must have at least one element.")
    514 
--> 515         return cls(nodes, node_type="center", **kwargs)
    516 
    517     @classmethod

~/software/miniconda3/lib/python3.8/site-packages/gammapy/maps/axes.py in __init__(self, nodes, interp, name, node_type, unit)
     90 
     91         if len(nodes) == 1 and node_type == "center":
---> 92             raise ValueError("Single bins can only be used with node-type 'edges'")
     93 
     94         if isinstance(nodes, u.Quantity):

ValueError: Single bins can only be used with node-type 'edges'

I guess you cannot define the MapAxis with a center without edges... Is this an issue of gammapy or can we find a workaround?

adonath commented 2 years ago

Thanks for putting in the work to update and providing feedback as well @cosimoNigro!

I just checked our issue tracker and re-opened https://github.com/gammapy/gammapy/issues/1952. It was a known issue, but at that time we only implemented the exact check that you see above and did not decide to actually support node_type="center"with a single energy bin. In Gamma-ray astronomy the data is basically always undersampled in energy and one always works by integrating the models over a given energy range. For this use case it's slightly different.

This also points to a possible workaround. Gammapy gives the e_min and e_max information priority in defining the MapAxis (see https://github.com/gammapy/gammapy/blob/master/gammapy/maps/axes.py#L1216). This means you could define the e_min and e_max in addition, like:

from gammapy.estimators import FluxPoints
from astropy.table import Table
from io import BytesIO

yaml =  b"""
        # %ECSV 0.9
        # ---
        # datatype:
        # - {name: e_ref, unit: eV, datatype: float64}
        # - {name: e_min, unit: eV, datatype: float64}
        # - {name: e_max, unit: eV, datatype: float64}
        # - {name: e2dnde, unit: erg / (cm2 s), datatype: float64}
        # - {name: e2dnde_errn, unit: erg / (cm2 s), datatype: float64}
        # - {name: e2dnde_errp, unit: erg / (cm2 s), datatype: float64}
        # - {name: instrument, datatype: string}
        e_ref e_min e_max e2dnde e2dnde_errn e2dnde_errp instrument
        0.000153032 0.00015150168 0.00015457777777777776 6.99043e-13 3.2562e-14 3.2562e-14 Metsahovi
        """

table = Table.read(BytesIO(yaml), format="ascii.ecsv")
fp = FluxPoints.from_table(table, format="gadf-sed", sed_type="e2dnde")

print(fp.energy_ref)

The e_ref is interpolated internally typically like sqrt(e_min * e_max) (log-log interpolation...), so you could choose any arbitrary value that conserves e_ref, however ideally you can set e_min and e_max to the bandwidth of the instrument, such that it has a real physical meaning. If you provide a reference_model to .from_table() as well, then you can convert between the different SED types directly by accessing flux_points.dnde, flux_points.e2dnde, flux_points.flux etc.

I think we can resolve https://github.com/gammapy/gammapy/issues/1952 by adding the support for node_type="center"with a single energy bin.

cosimoNigro / agnpy_paper

Test scripts with SED fit with gammapy-0.19 #18