Closed cosimoNigro closed 2 years ago
This should be quite useful @cosimoNigro . One general comment here... Currently, gammapy FluxPointEstimator
needs either an amplitude
or norm
parameter to be defined on the model. (I imagine that it would be interesting to fit reduced datasets directly with physical models and extract flux points). This is not the case here.
It is not a blocker, as the user can still multiply SpectralModelSSC
by a normed model to the specific case of extracting flux points, but it is probably important to be correctly conveyed
Sorry my previous comment was not totally correct. As per the latest developments, users need to specify is_norm
for scalable parameters (see discussions in https://github.com/gammapy/gammapy/issues/3690#issuecomment-1116218619)
So one option (to simplify things for the users) here maybe be to add a norm
parameter which must be kept frozen, but can be used internally during flux point estimation...
Merging #119 (d9e3a79) into master (86a28d4) will increase coverage by
0.43%
. The diff coverage is98.70%
.
@@ Coverage Diff @@
## master #119 +/- ##
==========================================
+ Coverage 96.99% 97.43% +0.43%
==========================================
Files 30 38 +8
Lines 2233 2997 +764
==========================================
+ Hits 2166 2920 +754
- Misses 67 77 +10
Flag | Coverage Δ | |
---|---|---|
unittests | 97.43% <98.70%> (+0.43%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
setup.py | 0.00% <ø> (ø) |
|
agnpy/fit/models.py | 87.50% <87.50%> (ø) |
|
agnpy/fit/gammapy_wrapper.py | 95.39% <95.39%> (ø) |
|
agnpy/tests/test_wrappers.py | 99.20% <99.20%> (ø) |
|
agnpy/__init__.py | 100.00% <100.00%> (ø) |
|
agnpy/emission_regions/blob.py | 94.28% <100.00%> (+0.05%) |
:arrow_up: |
agnpy/fit/__init__.py | 100.00% <100.00%> (ø) |
|
agnpy/fit/core.py | 100.00% <100.00%> (ø) |
|
agnpy/fit/data.py | 100.00% <100.00%> (ø) |
|
agnpy/fit/sherpa_wrapper.py | 100.00% <100.00%> (ø) |
|
... and 4 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Thanks for your comments @AtreyeeS, now k_e
, the normalisation of the electron distribution has is_norm=True
when converted to a gammapy.modeling.Parameter
. The flux or SED should scale linearly with the normalisation of the radiating electrons, as it is just a multiplicative factor in all the SED calculations.
Will add some tests later.
The flux or SED should scale linearly with the normalisation of the radiating electrons, as it is just a multiplicative factor in all the SED calculations.
I think that while the synchrotron and EC flux goes as k_e
, the SSC goes as k_e^2
(because its the same population of electrons, so one factor of k_e from the electrons, and one from the sync photons).
You can see the same in the fig below:
Yes, you are totally right! Thank you!
I will then introduce a general amplitude
multiplying the SED and that would be the norm
of the model.
Would that be ok?
@cosimoNigro : In gammapy, we tend to use amplitude
as a dimensional parameter, and norm
as a scaling, so maybe norm
is better? And document it well that the norm
must be frozen during the fit. I don't know if it is possible to hide it from the users in any way, but if you are separating parameters as blob_parameters
and jet_parameters
as @QRemy suggested, then at least the norm should not show up in either of those
Ok, so I added the norm
, and I put it a the bottom of the parameter list. It does appear if you call all the parameters together (don't know how to hide it).
from agnpy.emission_regions import Blob
from agnpy.wrappers import SynchrotronSelfComptonSpectralModel
blob = Blob()
ssc_model = SynchrotronSelfComptonSpectralModel(blob)
print(ssc_model.parameters.to_table())
type name value unit error min max frozen is_norm link
-------- --------- ---------- ---- --------- --------- --------- ------ ------- ----
spectral k_e 9.2748e+06 cm-3 0.000e+00 1.000e+00 1.000e+09 False False
spectral p 2.8000e+00 0.000e+00 1.000e+00 5.000e+00 False False
spectral gamma_min 1.0000e+02 0.000e+00 1.000e+00 1.000e+03 True False
spectral gamma_max 1.0000e+07 0.000e+00 1.000e+04 1.000e+08 True False
spectral z 6.9000e-02 0.000e+00 1.000e-03 1.000e+01 True False
spectral d_L 9.9203e+26 cm 0.000e+00 1.368e+25 3.271e+29 True False
spectral delta_D 1.0000e+01 0.000e+00 1.000e+00 1.000e+02 False False
spectral B 1.0000e+00 G 0.000e+00 1.000e-04 1.000e+03 False False
spectral R_b 1.0000e+16 cm 0.000e+00 1.000e+12 1.000e+18 False False
spectral norm 1.0000e+00 0.000e+00 1.000e-01 1.000e+01 True True
As @AtreyeeS said, it does not appear in the "grouped" parameters
print(ssc_model.spectral_parameters.to_table())
type name value unit error min max frozen is_norm link
-------- --------- ---------- ---- --------- --------- --------- ------ ------- ----
spectral k_e 9.2748e+06 cm-3 0.000e+00 1.000e+00 1.000e+09 False False
spectral p 2.8000e+00 0.000e+00 1.000e+00 5.000e+00 False False
spectral gamma_min 1.0000e+02 0.000e+00 1.000e+00 1.000e+03 True False
spectral gamma_max 1.0000e+07 0.000e+00 1.000e+04 1.000e+08 True False
print(ssc_model.spectral_parameters.to_table())
type name value unit error min max frozen is_norm link
-------- ------- ---------- ---- --------- --------- --------- ------ ------- ----
spectral z 6.9000e-02 0.000e+00 1.000e-03 1.000e+01 True False
spectral d_L 9.9203e+26 cm 0.000e+00 1.368e+25 3.271e+29 True False
spectral delta_D 1.0000e+01 0.000e+00 1.000e+00 1.000e+02 False False
spectral B 1.0000e+00 G 0.000e+00 1.000e-04 1.000e+03 False False
spectral R_b 1.0000e+16 cm 0.000e+00 1.000e+12 1.000e+18 False False
I integrated the computation of the SED with the Gammapy
wrappers in the tests, I find quite some discrepancy at the highest energies. I expect exactly identical results as I am using the same functions. Will try to dig deeper later.
Tests fail because the dev version of Gammapy is needed. Will fix it.
Ok, now it's complaining about the astropy version... :face_exhaling:
Ok, very well, the tests were failing because Gammapy
- that I am now installing - uses astropy 5.0
that has dropped support for python 3.7. Just removed python 3.7 from the tests.
Will take time to make a final couple of improvements in the EC wrapper and then merge. In particular, the emission of the thermal components (disk and DT) is missing, also I would like to improve how the target for scattering is defined.
I completed the integration of the wrapper, but when I try to run the fit on the Mrk421 SED it takes 20-30 minutes now :scream:, against the 2-3 minutes it was taking before (Gammapy 0.18.2). Any idea why now it is taking so long compared to the notebooks with Gamampy 0.18.2? Is it because I am now using several FluxPointsDataset
s?
Here a script to run the fit, it is also available in experiments/check_gammapy_fit.py
# import numpy, astropy and matplotlib for basic functionalities
import time
import pkg_resources
import astropy.units as u
from astropy.table import Table
import matplotlib.pyplot as plt
# import agnpy classes
from agnpy.emission_regions import Blob
from agnpy.wrappers import SynchrotronSelfComptonSpectralModel
# import gammapy classes
from gammapy.modeling.models import (SPECTRAL_MODEL_REGISTRY, SkyModel)
from gammapy.estimators import FluxPoints
from gammapy.datasets import Datasets, FluxPointsDataset
from gammapy.modeling import Fit
# IMPORTANT: add the new custom model to the registry of spectral models recognised by gammapy
SPECTRAL_MODEL_REGISTRY.append(SynchrotronSelfComptonSpectralModel)
# total energy content of the electron distribution
spectrum_norm = 5e46 * u.Unit("erg")
# dictionary describing the electron distribution
spectrum_dict = {
"type": "BrokenPowerLaw",
"parameters": {
"p1": 2.02,
"p2": 3.43,
"gamma_b": 9e4,
"gamma_min": 500,
"gamma_max": 1e6
}
}
R_b = 1.5e16 * u.cm
B = 0.1 * u.G
z = 0.0308
delta_D = 20
Gamma = 20
# emission region
blob = Blob(R_b, z, delta_D, Gamma, B, spectrum_norm, spectrum_dict)
# model
ssc_model = SynchrotronSelfComptonSpectralModel(blob)
ssc_model.parameters["gamma_b"].min = 1e3
ssc_model.parameters["gamma_b"].max = 5e5
print("loading flux points...")
# load the MWL flux points
datasets = Datasets()
flux_points = {}
sed_path = pkg_resources.resource_filename("agnpy", "data/mwl_seds/Mrk421_2011.ecsv")
table = Table.read(sed_path)
table = table.group_by("instrument")
# do not use frequency point below 1e11 Hz, affected by non-blazar emission
E_min_fit = (1e11 * u.Hz).to("eV", equivalencies=u.spectral())
for group in table.groups:
name = group["instrument"][0]
data = FluxPoints.from_table(group, sed_type="e2dnde", format="gadf-sed")
dataset = FluxPointsDataset(data=data, name=name)
flux_points.update({name: data})
dataset.mask_fit = dataset.data.energy_ref > E_min_fit
datasets.append(dataset)
# load the SSC model in the datasets
model = SkyModel(spectral_model = ssc_model, name="Mrk421")
datasets.models = [model]
# plot the starting model and the flux points
fig, ax = plt.subplots(figsize=(10, 6))
for key in flux_points.keys():
flux_points[key].plot(ax=ax, label=key)
model.spectral_model.plot(energy_bounds=[1e-6, 1e14] * u.eV, energy_power=2, label="model")
plt.legend(ncol=4)
plt.xlim([1e-6, 1e14])
plt.show()
fig.savefig("initial_model.png")
# define the fitter and run it
fitter = Fit()
start = time.time()
print("fitting...")
results = fitter.run(datasets)
end = time.time()
delta_t = end - start
print(f"execution time {delta_t:.2f} s")
print(results)
print(model.spectral_model.parameters.to_table())
# plot the final model and the flux points
fig, ax = plt.subplots(figsize=(10, 6))
for key in flux_points.keys():
flux_points[key].plot(ax=ax, label=key)
model.spectral_model.plot(energy_bounds=[1e-6, 1e14] * u.eV, energy_power=2, label="model")
plt.legend(ncol=4)
plt.xlim([1e-6, 1e14])
plt.show()
fig.savefig("final_model.png")
The other difference with the tutorials notebooks with Gammapy 0.18.2 is that I was increasing the errors with the systematics, but I don't think it's the reason it is taking so much now. I remember having reasonable times also without adding systematics. Additionally I am using a starting model very close to the SED points.
Any ideas @QRemy, @AtreyeeS, @adonath?
Is it because I am now using several FluxPointsDatasets?
Theoretically yes. At some point the Python loop in Datasets.stat_sum()
could become the bottleneck, but I doubt this the case here. The integrations performed within the loop should still be the dominant part of the computation.
Can you compare the number of fit iterations for both cases? Is it the same? My strong guess here is no. It seems the convergence changed here...
The total fit on my M1 machine takes 12 min with 2354 iterations. The number of iterations seems rather high to me...
The 1-d joint fit of FluxPointsDatasets is not covered by gammapy-benchmark so I don't have the history of evaluation time to compare. Could you try to run a profiler to see what takes most of the time, for example :
pip install pyinstrument
pyinstrument -r html -o check_gammapy_fit.html check_gammapy_fit.py
Could you try to run a profiler to see what takes most of the time, for example
Done, the most time-consuming tasks seem to be those related with optimisation and flux points.
full .html here.
Is it because I am now using several FluxPointsDatasets?
Maybe it can contribute a part of the reason. I tried a very simple test - fit the hawk crab flux points supplied within gammapy data
FluxPointsDataset
FluxPointsDataset
The fit results do not change, but the time for fitting increased by a factor of 7, you can see more here: https://github.com/AtreyeeS/test_notebooks/blob/master/test_joint_fit.ipynb
from agnpy.utils.math import log
%timeit log(np.ones(10))
%timeit [log(np.ones(1)) for k in range(10)]
Arrays are just more efficient... Here, the increase in time is proportional to the number of calls to the log function, so indeed more datasets with less flux points per dataset can explain the difference. The FluxPointsDatasets should probably concatenate the data internally to run as fast as a single FluxPointDataset.
@cosimoNigro, @QRemy and @AtreyeeS I agree it seems the increase in computation time here seems to come from splitting the datasets. However all things considered I think it's still the right way to go:
MapDataset
for Fermi-LAT or a IACT it will dominate the computation time anyway... The FluxPointsDatasets should probably concatenate the data internally to run as fast as a single FluxPointDataset.
I'm not sure it is worth it. From the Gammapy side we could rather work on:
But those are bigger development projects...
Regarding developments on the gammapy side, I tend to agree with @adonath . The best support to provide would be support for non-contiguous energy axis, but that might take some time, and needs separate discussions.
But if something can be done within agnpy to speed up the fitting, it would be good. I dont think its very practical for users to have to spend 30 mins (or even more) to perform a fit. In such a scenario, the utility is severely hampered - and I guess it would be much better to use sherpa or other backends in such cases. Or, if it is possible to have lesser number of datasets somehow. At this point, this splitting by instrument provides a clean methodology, but is maybe not very interesting for modelling purposes
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@cosimoNigro Were you able to resolve the convergence issues by changing to the log scaling again?
For Gammapy I opened a feature request as a a reminder https://github.com/gammapy/gammapy/issues/4027
Yes @adonath, thanks for asking! I will send to the Gammapy devs the link to the docs with the updated notebooks, once I finish everything. The speed of the fit is now 3 minutes, compared to the 2 minutes I had with Gammapy 0.18.2
, so everything looks good.
Ok, I am done with this.
I have introduced wrappers both for Gammapy
and sherpa
models.
I have also added some helper functions to load the SED data in a sherpa.data.Data1D
object or in a list of gammapy.datasets.FluxPointsDatasets
.
Proper tests have been added for the new classes / functions added.
I can now obtain the same results from the two wrappers for the two sources considered, Mrk421 and PKS1510-089.
I will make a new release after I merge.
I think that some people are using our tutorial notebook for fitting with the Gammapy wrapper. https://agnpy.readthedocs.io/en/latest/tutorials/ssc_gammapy_fit.html
I don't think it is particularly pretty, especially because of the large cell of code defining the wrapper. There was some discussion in one of the developer calls of
Gammapy
if we were to include the wrapper inagnpy
, or inGammapy
, as currently done for theNaimaSpectralModel
.I think it is ok to have the wrapper within
agnpy
, we will be introducingGammapy
as an additional dependency but I think it's also fine. Now we will have a centralised wrapper, and hopefully adding a line of code we can solve the issue encountered by @guillaumegrolleron and @jepcor97 when fitting and simulating with the latestGammapy
versions (see the comments in the wrappers).I thought about having a single wrapper per each radiative processes but this implies:
Gammapy.SpectralModel
summing them.So I implemented a wrapper per each scenario / source type, I have written:
Parameters are read from an instance of the emission region and target.
Many things are missing, still I wanted to open the review to start to discuss the design and the organisation of the wrapper. @jsitarek what do you think? Let me know what you think about the strategy, the naming scheme, the position of the new submodule.
I will also be happy to have the opinions of @QRemy and @atreyeeS, from the
Gammapy
side - if they have time.Follows a script to cross-check the normal computation against the new wrappers implemented (we can incorporate this in the tests).