astrostat / pylira

A Python package for Bayesian low-counts image reconstruction and analysis
9 stars 7 forks source link

Simplify installation / build process #20

Open adonath opened 2 years ago

adonath commented 2 years ago

Installing pylira currently requires to install the R dependent to handle the random number generation in the C code. The R package can be installed via apt-get rather easily, however this option is not available on MacOS, where one needs to rely on brew or similar. This makes the setup of the CI for MacOs rather complex. However locally it still works e.g. by adjusting the paths defined in setup.py. But this is not a good nor convenient solution. What are the options?

adonath commented 2 years ago

Maybe @DougBurke you have any recommendation here?

adonath commented 2 years ago

After private communication with @anetasie, @DougBurke and @infinitron: the plan is to try to get rid of the R dependency at all, because it is only used for random number generation. This might solve https://github.com/astrostat/pylira/pull/26#issuecomment-971786033 as well. So I will change to testing with high statistics now and try to reproduce results without the R dependency. Either by using functionality from the standard library or Numpy's random C API https://numpy.org/doc/stable/reference/random/c-api.html.

adonath commented 2 years ago

I looked a bit further into this and getting rid of the R dependency is not as simple. Beside the random number sampling it is used for special methods such as digamma and trigamma so the functionality cannot be replaced by functions from the standard library. I think a very good compromise would be to just rely on a standalone version of the Rmath library, e.g. https://github.com/statslabs/rmath or even a Python package like https://github.com/ntessore/Rmath-python. I'll investigate a bit more and comment again.

DougBurke commented 2 years ago

I think one issue is that the C++ code is written to do it all, with no opportunity for swapping out the chain handling for a different method (say). We currently provide access to only image_analysis_R frmo Python, but I could imagine providing access to lower-level routines and then writing the equivalent of image_analysis_R in python. So the first round would be to make initialize_control, allocate_memory, set_psf_from_R, ... bayes_image_analysis callable from Python. Then once this is stable you try to break apart one of these routines.

The trick then becomes when do you have to stay in C++ for speed reasons, and whether you can break it apart enough to be able to replace the R code by scipy/other packages?

It also depends on how easy these routines are to wrap with pybind11, and how awkward the python interface is to keep up with the underlying memory handling (e.g. when do things get cleaned up and are there any subtle interactions in C++ scope that could cause problems here).

adonath commented 2 years ago

Thanks @DougBurke! I fully agree giving access to the lower level routines from Python would be very valuable for both testing as well as later modifications and extension of the algorithm. My guess is that >75% of the code could be replaced by Numpy equivalent expressions without sacrificing too much of the performance.

Using pybind11 one can basically "outsource" the memory handling to Numpy. So once the data structures (such as psfType and cntType ) are wrapped by Python once could probably get rid of the memory management completely. I remember I looked into this shortly and one non-trivial thing I found was the handling of pointer to pointer data structures which are present in e.g. the psfType and cntType structs. The standard interface in pybind11 relies on the buffer protocol which only has support for flattened arrays.

adonath commented 2 years ago

Maybe @infinitron has some insight here as well...

hamogu commented 2 years ago

digamma is available both in scipy.special and in mpmath. I bet the scipy version is implemented in C, but in general I've also made very good experiences with mpmath. It's a much lighter dependency than scipy and works at arbitrary precision if needed; in many cases it's not even slower than the compiled scipy versions. For example, in astropy.stats.poisson_conf_interval( ... method='Kraft-Burrows-Nousek') I've implemented the same algorithm to work with either scipy or mpmath (whatever packsge is available). So maybe at least those routines don't have to be wrapped at all and can just be called from a packages that's already part of the ecosystem and (at least in the case of mpmath) installs easily with pip anywhere or (scipy) has conda packages for most major systems (now also M1 native on conda-forge).