Combiner memory consumption

welterde commented 10 years ago

Stacking several 4k*4k images results in the process running out of memory quite quickly. For now I have replaced some parts locally that do the same thing line by line instead of the whole thing at once, which already reduces the memory requirements significantly, but it's still not ideal.

Are there any plans to replace the current workings of the combiner with some cython code?

mwcraig commented 10 years ago

Thanks for the report @welterde. We do want to reduce memory usage, and I don't think there is any objection in principle to using cython, though I don't think that alone would fix the problem. I know @crawfordsm has some code that does something like what you have implemented, breaking images up into chunks and combining one chunk at a time.

Could you please send some more details about what you are doing? These would be helpful:

What is the data type of the image? Particularly important is whether they are unsigned int or any other data type that requires BZERO/BSCALE in the FITS header.
Are you combining by average or median?
Are the images masked?
How many are you combining?
Are you clipping also?

Thanks!

welterde commented 10 years ago

Having cython operate on mmap'ed data should reduce memory consumption to one full image + some smallish buffers it operates on, shouldn't it?

That's indeed something I missed.. so I guess I should convert it to int32 beforehand so it stays memory-mapped?
Both (not on the same data).. and both lead to my program running out of memory quite quickly (even using bottleneck for the masked operations)
Not yet
3-10 (which would add up to data_arr being at least 300MB in size iirc)
Not yet

mwcraig commented 10 years ago

That's indeed something I missed.. so I guess I should convert it to int32 beforehand so it stays memory-mapped?

int32 or float32; anything without scaling should improve memory performance since it can be memory mapped. Right now your unsigned int images are being read into memory after being scaled, and converted to float32 in the process (unless you pass the right arguments to CCDData.read (any arguments are passed through to astropy.io.fits)).

The upshot is the total memory usage might be as high as 840MB: 4k4k(4+1)*10...the (4+1) is for the data (4 bytes) and the mask (1 byte).

I'll try to take a look at this alter this week; we might very well be able to reduce the usage by being more careful to create references to the images rather than copies.

mwcraig commented 10 years ago

@welterde -- I took a closer look today and there are some easy changes that can be made to reduce memory usage:

Right now the Combiner class creates a numpy masked array with data type float64, so in your case the memory usage for 10 images is 4k_4k_10*(4 + 8 + 1) ~ 2GB. There wasn't a deliberate choice to use float64 here, it just seems to be what numpy defaults to (at least on my mac).
Combiner makes a copy of the data rather than using references to it. For the mask that probably makes sense, since clipping can change the mask and it isn't clear whether the user would want the mask on the original data changed. Not sure how to fix this, but don't think it would require a huge change in the code.

Do you have any interest in putting together a pull request to do the first change (and maybe the second)? I'm happy to walk you through both the change that would need to be made and the pull request process.

If not, I should be able to do a fix this weekend.

crawfordsm commented 10 years ago

Thanks for the feedback. There is definitely a few things that can be done to improve the memory usage of combiner. Very early on, I was hoping to write a better implementation of it, although memory usage runs into a lot of issues in terms of the OS and system being used. However, any contributions here would be welcome.

However, I've posted a gist here to show how it can be done for an arbitrary sized image--obviously speed isn't the concern, but this should work for any set of data: https://gist.github.com/crawfordsm/bbe63df8f3c6491d4d6c#file-combiner_large

If a convenience function is written for the Combiner class ( #65 ), this is something that I would plan to include.

mwcraig commented 10 years ago

Found a volunteer to work on the easy fix hear (avoid float64 unless the underlying data is float64). @heidtna -- the line that needs to be modified is https://github.com/astropy/ccdproc/blob/master/ccdproc/combiner.py#L78 . I think the change needed is to added the argument dtype=, setting dtype to the dtype of one of the images.