Open adonath opened 2 years ago
Maybe @DougBurke you have any recommendation here?
After private communication with @anetasie, @DougBurke and @infinitron: the plan is to try to get rid of the R dependency at all, because it is only used for random number generation. This might solve https://github.com/astrostat/pylira/pull/26#issuecomment-971786033 as well. So I will change to testing with high statistics now and try to reproduce results without the R dependency. Either by using functionality from the standard library or Numpy's random C API https://numpy.org/doc/stable/reference/random/c-api.html.
I looked a bit further into this and getting rid of the R dependency is not as simple. Beside the random number sampling it is used for special methods such as digamma
and trigamma
so the functionality cannot be replaced by functions from the standard library. I think a very good compromise would be to just rely on a standalone version of the Rmath
library, e.g. https://github.com/statslabs/rmath or even a Python package like https://github.com/ntessore/Rmath-python. I'll investigate a bit more and comment again.
I think one issue is that the C++ code is written to do it all, with no opportunity for swapping out the chain handling for a different method (say). We currently provide access to only image_analysis_R
frmo Python, but I could imagine providing access to lower-level routines and then writing the equivalent of image_analysis_R
in python. So the first round would be to make initialize_control
, allocate_memory
, set_psf_from_R
, ... bayes_image_analysis
callable from Python. Then once this is stable you try to break apart one of these routines.
The trick then becomes when do you have to stay in C++ for speed reasons, and whether you can break it apart enough to be able to replace the R code by scipy/other packages?
It also depends on how easy these routines are to wrap with pybind11, and how awkward the python interface is to keep up with the underlying memory handling (e.g. when do things get cleaned up and are there any subtle interactions in C++ scope that could cause problems here).
Thanks @DougBurke! I fully agree giving access to the lower level routines from Python would be very valuable for both testing as well as later modifications and extension of the algorithm. My guess is that >75% of the code could be replaced by Numpy equivalent expressions without sacrificing too much of the performance.
Using pybind11 one can basically "outsource" the memory handling to Numpy. So once the data structures (such as psfType
and cntType
) are wrapped by Python once could probably get rid of the memory management completely. I remember I looked into this shortly and one non-trivial thing I found was the handling of pointer to pointer data structures which are present in e.g. the psfType
and cntType
structs. The standard interface in pybind11
relies on the buffer protocol which only has support for flattened arrays.
Maybe @infinitron has some insight here as well...
digamma
is available both in scipy.special
and in mpmath
. I bet the scipy version is implemented in C, but in general I've also made very good experiences with mpmath. It's a much lighter dependency than scipy and works at arbitrary precision if needed; in many cases it's not even slower than the compiled scipy versions. For example, in astropy.stats.poisson_conf_interval( ... method='Kraft-Burrows-Nousek')
I've implemented the same algorithm to work with either scipy or mpmath (whatever packsge is available). So maybe at least those routines don't have to be wrapped at all and can just be called from a packages that's already part of the ecosystem and (at least in the case of mpmath) installs easily with pip anywhere or (scipy) has conda packages for most major systems (now also M1 native on conda-forge).
Installing
pylira
currently requires to install theR
dependent to handle the random number generation in the C code. The R package can be installed viaapt-get
rather easily, however this option is not available on MacOS, where one needs to rely onbrew
or similar. This makes the setup of the CI for MacOs rather complex. However locally it still works e.g. by adjusting the paths defined insetup.py
. But this is not a good nor convenient solution. What are the options?pylira
dependent on it? There are already solutions such as https://github.com/ntessore/Rmath-python