Thanks a lot for this great package. I noticed that most of the memory footprint comes from the data slot which is a base R array.
Every time we do an image operation directly on Image or AnnotatedImage will lead to a full copy of the data and doubles memory usage. For example, let's say we have an Image object img. Then, operations like img * 50 will lead to double of memory usage (even without assign the result to anything!)
Also interestingly, display function will give even more memory usage than double. For example, operations like display(img) will typically give around 200% memory bump for me.
I am assuming that this behavior is caused by the usage of base R array, maybe somewhat mentioned in Issue#40 already.
Could we use on-disk array libraries to replace the current base R array?
For example, HDF5Array seems to be a reasonable replacement. I did a bit research and it seems that 1) it is well-supported by Bioconductor core team, 2) it allows easy conversion of base R array to an on-disk temporary HDF5 array (as simple as hdf5array <- HDF5Array(base.R.array), 3) it chunks the array into small pieces for fast access and only loads the relevant chunks into memory when needed.
Hi,
Thanks a lot for this great package. I noticed that most of the memory footprint comes from the data slot which is a base R array.
Every time we do an image operation directly on Image or AnnotatedImage will lead to a full copy of the data and doubles memory usage. For example, let's say we have an Image object
img
. Then, operations likeimg * 50
will lead to double of memory usage (even without assign the result to anything!)Also interestingly,
display
function will give even more memory usage than double. For example, operations likedisplay(img)
will typically give around 200% memory bump for me.I am assuming that this behavior is caused by the usage of base R array, maybe somewhat mentioned in Issue#40 already.
Could we use on-disk array libraries to replace the current base R array?
For example, HDF5Array seems to be a reasonable replacement. I did a bit research and it seems that 1) it is well-supported by Bioconductor core team, 2) it allows easy conversion of base R array to an on-disk temporary HDF5 array (as simple as
hdf5array <- HDF5Array(base.R.array)
, 3) it chunks the array into small pieces for fast access and only loads the relevant chunks into memory when needed.