Closed welterde closed 9 years ago
Thanks for the report @welterde. We do want to reduce memory usage, and I don't think there is any objection in principle to using cython, though I don't think that alone would fix the problem. I know @crawfordsm has some code that does something like what you have implemented, breaking images up into chunks and combining one chunk at a time.
Could you please send some more details about what you are doing? These would be helpful:
Thanks!
Having cython operate on mmap'ed data should reduce memory consumption to one full image + some smallish buffers it operates on, shouldn't it?
That's indeed something I missed.. so I guess I should convert it to int32 beforehand so it stays memory-mapped?
int32 or float32; anything without scaling should improve memory performance since it can be memory mapped. Right now your unsigned int images are being read into memory after being scaled, and converted to float32 in the process (unless you pass the right arguments to CCDData.read
(any arguments are passed through to astropy.io.fits
)).
The upshot is the total memory usage might be as high as 840MB: 4k4k(4+1)*10...the (4+1) is for the data (4 bytes) and the mask (1 byte).
I'll try to take a look at this alter this week; we might very well be able to reduce the usage by being more careful to create references to the images rather than copies.
@welterde -- I took a closer look today and there are some easy changes that can be made to reduce memory usage:
Combiner
class creates a numpy masked array with data type float64
, so in your case the memory usage for 10 images is 4k_4k_10*(4 + 8 + 1) ~ 2GB. There wasn't a deliberate choice to use float64
here, it just seems to be what numpy defaults to (at least on my mac). Combiner
makes a copy of the data rather than using references to it. For the mask that probably makes sense, since clipping can change the mask and it isn't clear whether the user would want the mask on the original data changed. Not sure how to fix this, but don't think it would require a huge change in the code.Do you have any interest in putting together a pull request to do the first change (and maybe the second)? I'm happy to walk you through both the change that would need to be made and the pull request process.
If not, I should be able to do a fix this weekend.
Thanks for the feedback. There is definitely a few things that can be done to improve the memory usage of combiner. Very early on, I was hoping to write a better implementation of it, although memory usage runs into a lot of issues in terms of the OS and system being used. However, any contributions here would be welcome.
However, I've posted a gist here to show how it can be done for an arbitrary sized image--obviously speed isn't the concern, but this should work for any set of data: https://gist.github.com/crawfordsm/bbe63df8f3c6491d4d6c#file-combiner_large
If a convenience function is written for the Combiner class ( #65 ), this is something that I would plan to include.
Found a volunteer to work on the easy fix hear (avoid float64
unless the underlying data is float64
). @heidtna -- the line that needs to be modified is https://github.com/astropy/ccdproc/blob/master/ccdproc/combiner.py#L78 . I think the change needed is to added the argument dtype=
, setting dtype
to the dtype
of one of the images.
The major hurdle I faced while trying to migrate from IRAF based data reduction pipelines to ccdproc was this memory issue. In Infrared observations, the number of files per objects are typically huge, and it is impossible to load all the images to memory before combining them. (for example I had to take median of ~200 frames). I don't know how IRAF does it. I guess the only way we can implement it in ccdproc is to slice the image into smaller tiles and combine one tile at a time.
iraf basically does it by dividing up every frame into smaller regions which is handled in the gist above. There has been an outstanding task to include a combiner task which the gist could include or alternatively to write it a completely different version in cython that would handle it pixel by pixel. It hasn't been very high on the priority list right now, but any help with it would also be appreciated.
I shall try to write a function decorator which will do the procedure done in the gist. Which we can then use inside the code for average, median (or any future) combine methods of combiner.py. .
Thanks, @indiajoe , that would be awesome!
Got some time today to write the decorator (I have kept it here temporarily: https://gist.github.com/indiajoe/24d916c3450f448f6b0e ) But I think I understood the problem wrongly. Memory usage at which stage are we trying to reduce? If I understand correctly when we create a combiner object it creates an internal 3D maskedArraycube. While taking median or average, does it really create another copy of that array? If so using the decorator around those functions will help. But now I have a feeling, the aim was to prevent the combiner from creating a full 3D cube of the data in the first place when it initialises. Not at the mean or median calculation step. Am I right? If so, function decorator might not be the solution. We will need to make a convenience function to create multiple combiner objects like the script in original gist does.
@indiajoe -- you are correct that combiner creates an internal copy as a masked array, so the goal would be to break the array into chunks. Also, if you could take a quick glance at http://astropy.readthedocs.org/en/stable/development/codeguide.html#coding-style-conventions that would be great -- main thing I noticed in the gist is that you are using camel case (which we've avoided).
Thanks again!
@mwcraig , @crawfordsm -- I have written another convenience class which will create a new combiner instance for each sub tile, set the properties etc.. https://github.com/indiajoe/ccdproc/commit/060f046518b35965649883857f78c4d8fe8c2aff
The code above is just to show the structure and outline of the planned implementation. I haven't tested it yet, and I shall also correct it to follow astropy coding style guidelines. Before that, I thought it might be a good idea to ask for any feedback and suggestions on this approach?
Was this closed by #218 ?
I think, it is solved. #218 wasn't a cython implementation as suggested earlier in this issue. But it solves the problem.
@crawfordsm -- can you please close this if #218 fixed it? Thanks!
Stacking several 4k*4k images results in the process running out of memory quite quickly. For now I have replaced some parts locally that do the same thing line by line instead of the whole thing at once, which already reduces the memory requirements significantly, but it's still not ideal.
Are there any plans to replace the current workings of the combiner with some cython code?