PaulHancock / Aegean

The Aegean source finding program and associated tools
http://aegeantools.rtfd.io/
Other
47 stars 14 forks source link

retire dependency on `pprocess` #170

Closed PaulHancock closed 2 years ago

PaulHancock commented 2 years ago

The pprocess module was a convenient way to implement multiprocessing for aegean but it is no longer being developed. The current AegeanTools has internalised a version of pprocess that was updated to work with python3, however this module cannot be supported long-term.

pprocess should be replaced by appropriate use of the multiprocessing module.

PaulHancock commented 2 years ago

Testing on my local machine reveals that using --cores=1 is the fastest way to find sources in an image. My resource monitor shows that I have 1 instance of aegean running and that it is using 100% of all 8 cores. On a test image of 3k x 3k pixels, with 1297 islands, and 1376 components:

Using scalene with --cores=1 I can see that 90% of the execution time is 'native' with the other 10% being 'python'. According to the documentation, this means that 90% of the time is spent in c/c++ libraries, which I assume are numpy/scipy.

It seems that numpy/scipy have some parallelism built in. In fact, upon further reading, it may not be numpy/scipy explicitly doing this - some BLAS/LAPAC and MKL functions have the ability to natively use multiple cores. Since much of the work of aegean is fitting via lmfit which in turn uses scipy.optimize.minimize which in turn uses system libraries, my system is able to do the multiprocessing 'for free'.

So, long story short, removing pprocess may be as simple as just removing all the aegean multiprocessing, and using the single core version. The BANE multiprocessing still needs to be managed by me since the linear algebra libraries don't do much work here.