Describe the bug
When an image viewer is open, creating an Incompatible Subset (that is, a subset defined over attributes not present in the reference_data of the image viewer) will create an extra array that is the full size of reference_data.
To Reproduce
From the glue terminal:
import numpy as np
from glue.core import Data
im = Data(label='data1',x=np.arange(100_000_000).reshape((10000, 10000)))
catalog = Data(label='catalog', c=[1, 3, 2], d=[4, 3, 3])
dc.append(im)
dc.append(catalog)
from glue.viewers.image.qt.data_viewer import ImageViewer
viewer = session.application.new_data_viewer(ImageViewer)
viewer.add_data(im)
glue should be using about 1 GB of memory at this point. Creating a subset that can be shown:
Causes memory usage to ~double to 2 GB. New Incompatible Subsets continue to grow the memory used.
Expected behavior
Incompatible subsets should not take up more memory that normal subsets.
Additional contextImageSubsetArray is called when a new subset tries to show itself on an Image Viewer. The __call__ method creates a broadcasted array of np.nan that is the full size of the Image Viewer reference_data. This array is then used in the make_image function of mpl_scatter_density.base_image_artist line 190 (self.set_data(array)) which causes the broadcasted array to materialize fully in memory.
I think the problem would essentially be solved by returning just the portion of the large nan array defined by bounds (if we don't trigger the IncompatibleAttribute we get a mask with the shape of bounds), but I confess I don't really understand why we're making a potentially giant nan array here in the first place, so I wanted to open a discussion before trying to fix this.
Describe the bug When an image viewer is open, creating an Incompatible Subset (that is, a subset defined over attributes not present in the reference_data of the image viewer) will create an extra array that is the full size of reference_data.
To Reproduce From the glue terminal:
glue should be using about 1 GB of memory at this point. Creating a subset that can be shown:
does not appreciably increase the memory usage of glue. However, creating a subset that cannot be shown:
Causes memory usage to ~double to 2 GB. New Incompatible Subsets continue to grow the memory used.
Expected behavior Incompatible subsets should not take up more memory that normal subsets.
Additional context
ImageSubsetArray
is called when a new subset tries to show itself on an Image Viewer. The__call__
method creates a broadcasted array ofnp.nan
that is the full size of the Image Viewer reference_data. This array is then used in themake_image
function ofmpl_scatter_density.base_image_artist
line 190 (self.set_data(array)
) which causes the broadcasted array to materialize fully in memory.I think the problem would essentially be solved by returning just the portion of the large nan array defined by bounds (if we don't trigger the IncompatibleAttribute we get a mask with the shape of bounds), but I confess I don't really understand why we're making a potentially giant nan array here in the first place, so I wanted to open a discussion before trying to fix this.