Astropy Integration: Model fitting

Jammy2211 / PyAutoGalaxy

PyAutoGalaxy: Open-Source Multiwavelength Galaxy Structure & Morphology

https://pyautogalaxy.readthedocs.io/

MIT License

26 stars 13 forks source link

Astropy Integration: Model fitting #70

Open pllim opened 1 year ago

pllim commented 1 year ago

This packages uses https://github.com/rhayes777/PyAutoFit that adds a new modeling ecosystem that is completely independent of the Astropy modeling library. Was there something preventing you from using astropy.modeling? Do you have plans to refactor the code to use it in the future?

Perhaps keep your machinery but allow your API to recognize Astropy models in addition to what you already support?

At the very least, there is a need to provide a translation layer between PyAutoFit profiles and Astropy models.

(An independent reviewer and myself share the same sentiment. This is probably the biggest issue in terms of Astropy Affiliated package acceptance in the "Integration with Astropy ecosystem" category.)

https://github.com/astropy/astropy.github.com/pull/491#issuecomment-1215349065

Jammy2211 commented 1 year ago

Going to start responding to the review in full, as I think its the most pressing issue to agree on.

We looked at the astropy modeling tools when we started development, but decided we needed to develop our own framework (PyAutoFit is a parent project of PyAutoGalaxy and developed by the same people).

Here are some of the features we required which (to my knowledge) the astropy modeling system does not support:

1) PyAutoFit outputs results to hard-disk in an ordered directory structure, and includes on-the-fly results and visualization, which is also important for being able to resume an analysis if you stop it:

The file model.results is shown, which contains parameter estimates, and you can also see there are many .png's where the visualization of the fit has been output during the analysis.

2) All PyAutoFit results can be written to an SQLite3 database, and then queried and loaded via a Jupyter notebook:

https://pyautofit.readthedocs.io/en/latest/features/database.html

This is the only way to properly handle the analysis of 100000+ galaxies.

3) PyAutoFit has an API dedicated to fitting multiple datasets simultaneously, which is vital for fitting multiwavelength datasets (https://pyautofit.readthedocs.io/en/latest/overview/multi_datasets.html)

Perhaps keep your machinery but allow your API to recognize Astropy models in addition to what you already support?

At the very least, there is a need to provide a translation layer between PyAutoFit profiles and Astropy models.

In terms of model-fitting I am not keen on supporting the Astropy modeling and fitting, as it would be extra code to maintain which and splits the API in two.

At the very least, there is a need to provide a translation layer between PyAutoFit profiles and Astropy models.

I am now wondering if I misinterpreted your suggestion... did you simply mean to use the astropy Sersic2d / Gaussian2D models and anything we have in common there? That I would be happy to do (we did actually have it set up this way initially, but I refactored it for some reason).

pllim commented 1 year ago

Thanks! I just want to let you know that I have receive your reply and it is on my todo list to respond. I thank you for your patience!

pllim commented 1 year ago

I am now wondering if I misinterpreted your suggestion

Not sure but I definitely do not mean throw all your stuff away and only use astropy. The meaning was more like integrate with astropy where possible. Therefore, for example, you can keep your way of doing things, but maybe have a way to provide astropy mode objects when requested.

we did actually have it set up this way initially, but I refactored it for some reason

What was the reason? I am a little curious...

p.s. There are some know performance issues with astropy.modeling, so maybe I can ping @williamjamieson or @perrygreenfield here in case they have comments on how astropy models can fit (pun?) into your analysis pipeline.

Jammy2211 commented 1 year ago

Not sure but I definitely do not mean throw all your stuff away and only use astropy. The meaning was more like integrate with astropy where possible. Therefore, for example, you can keep your way of doing things, but maybe have a way to provide astropy mode objects when requested.

I have read through astropy modeling in more detail. In terms of models (https://docs.astropy.org/en/stable/modeling/models.html) and fitting (https://docs.astropy.org/en/stable/modeling/fitting.html), I don't see a straight forward way to integrate this into PyAutoGalaxy (Although see below for the models themselves).

I could write an example script showing how one could fit a model using astropy.modeling, but it would effectively strip one of the functionality described above, and would ultimately only make sense for the user to use astropy.modeling module by itself.

It may make sense for PyAutoGalaxy to only use light profile objects from astropy.modeling, and for us to add all models we have which the package doesn't (e.g. cored Sersic) to the package and inherit from them. There are a few nuances with this but provided it make sense upon detailed inspection, I would be happy to work towards that end goal.

What was the reason? I am a little curious...

I think we were profiling different ways to evaluate light profiles (e.g. pure numpy, numba, astropy) and ended up choosing pure numpy for the sake of the code being more explicit and readable (e.g. a user doesn't then need to go through the astropy docs to understand how the Sersic function is implemented).

p.s. There are some know performance issues with astropy.modeling, so maybe I can ping @williamjamieson or @perrygreenfield here in case they have comments on how astropy models can fit (pun?) into your analysis pipeline.

I would be happy for any suggestions on where others see room for integration, but as I said above I don't see an obvious pathway!

perrygreenfield commented 1 year ago

I have read through this and I can't say one way or another whether astropy modeling should or shouldn't be used. To summarize the above objections:

1) results are saved to a directory structure with visualization.

I don't see why this excludes use of astropy models. One can do fits and then save the results in any way one wants and one can also visualize the results and save those too. As far as serialization goes, we have means of serializing models to ASDF files, that can be loaded back into Python and evaluated or modified later. Perhaps there is some special API requirement that excludes astropy modeling, but the provided detail doesn't explain that.

2) results can be written to an SQLite database

Again, I don't see why that can't be done with astropy models. It is unclear how these models are being saved (pickles? specific serialization code?).

3) multiple datasets can be fit simultaneously (by which I presume it means some fitting parameters are common to all the datasets).

Some of the astropy fitters also support this functionality. But even if they didn't, the framework allows adding custom fitters that do whatever one wants.

Finally, even if there are good reasons not to use astropy modeling, it shouldn't be difficult to provide conversion tools to convert to and from astropy models (albeit, it may need implementation of custom models in astropy, but if python code exists for such models, it is usually pretty easy to wrap those as instances of astropy models.

Jammy2211 commented 1 year ago

The original question was whether something was preventing us using astropy.modeling, and I explained functionality missing from the package that we needed for PyAutoGalaxy.

The suggestion now is that we could have used astropy.modeling, had we manually developed a lot of additional functionality around it, when another package already provided that functionality. We used PyAutoFit because it already existed and met all of our requirements.

I feel uncomfortable with the idea of offering support for two model-fitting packages, when there is nearly identical functionality in one package compared to the other. I have tried to find the commonalities between the two projects which I think could help them both (e.g. sharing profile classes like Sersic2D).

Here are some more examples of PyAutoFit functionality required for PyAutoGalaxy:

Support for MCMC fitting, nested sampling, both with parallelization over multiple cores: https://pyautofit.readthedocs.io/en/latest/overview/non_linear_search.html
Chaining model-fits together: https://pyautofit.readthedocs.io/en/latest/features/search_chaining.html and https://github.com/Jammy2211/autogalaxy_workspace/blob/release/notebooks/imaging/advanced/chaining/multiple_galaxies.ipynb
Graphical models for hierarchical Bayesian inference: https://pyautofit.readthedocs.io/en/latest/features/graphical.html and https://github.com/Jammy2211/autogalaxy_workspace/blob/release/notebooks/imaging/advanced/graphical/tutorial_4_hierarchical_models.ipynb

If it is necessary for astropy affiliation, I can add an example script to the [autogalaxy_workspace](https://github.com/Jammy2211/autogalaxy_workspace) showing how to perform a fit using astropy.modeling. However, I worry this would come across somewhat redundant and risks being see as limited-functionality compared to what other scripts in the workspace do.

perrygreenfield commented 1 year ago

OK, I think we understand. The essence of the point isn't that it can't be done in astropy modeling, but rather that you already had the additional functionality built on a different modeling package and can't afford to replicate that on top of astropy modeling.

Jammy2211 commented 1 year ago

Yeah, pretty much, we are only two developers and it took me over a month to respond to this issue! 🤣

pllim commented 1 year ago

example script

I think that would be a good start to push this forward.

For example, once I get a fit from your package, and I have my own model defined in astropy model, how do I use your fit in my astropy model? And vice versa? Basically, some primer on how what you have can talk to astropy.

Thanks!