Performance: Combining flats is slow

pllim commented 6 years ago

This is taken from Astropy project performance survey. Is this is not the correct repo for this issue, please advise.

I have not tried this in recent astropy versions, but the early code to combine flats was FAR slower than IRAF imcombine. I had done some reading up to discover that IRAF was very proud about how it did memory management to do this efficiently. Sorry cannot be more specific or useful than that.

MSeifert04 commented 6 years ago

Thanks for bringing this to our attention @pllim.

There are several places that are quite slow and memory inefficient in combine and Combiner. We are currently working on a few of these issues. But it's unlikely that we'll be faster (or as fast) as imcombine. 😅

jpmorgen commented 3 years ago

It occurs to me that considerable speed might be gained by using multiprocessing: each tile could be subdivided into sub-tiles to be farmed out to their individual combiners so that those of us who have multi-core machines (pretty much everyone these days) can enjoy a considerable boost in performance. I have started to design a system that I think would work and preserve the original memory handling subdivisions by making use of queues to provide instructions to the subprocesses and mmaped files (the astropy.fits default) to avoid opening the same file multiple times while creating the sub-tiles. Opening each file multiple times to create the chunk will still, of course, be necessary.

Is there some larger design decision that discourages the use of multiprocessing at this level? Yes, it means that upstream multiprocessed callers need to have their daemon flag set to False and divide the maximum memory and number of processes passed to combiner by the number of parent processes, but with those considerations, I think it should work OK.

If there is interest in the multiprocess route, let me know.

Thanks!

jpm

astropy / ccdproc

Performance: Combining flats is slow #645