MouseLand / cellpose

a generalist algorithm for cellular segmentation with human-in-the-loop capabilities
https://www.cellpose.org/
BSD 3-Clause "New" or "Revised" License
1.38k stars 394 forks source link

[FEATURE] Idea for the reduction of mask sizes #999

Closed XylotrupesGideon closed 2 weeks ago

XylotrupesGideon commented 2 months ago

Use a Napari-labels-like data format to reduce the size of Cellpose mask output before stitching

I was working with a large (14 GB) image stack and segmented it in cellpose. I was able to segment the individual z-planes, however when running 3d stitch I ran out of memory very quickly.

I then realized that by loading the planes into Napari, as label layers and then immediately saving them I could reduce the file size by 56x(!) (Cellpose output 18026 KB per z-plane vs 46 KB per z-plane when saved as Napari label layer). This enabled me to effortless run the 3D stitch on my computer.

I am not sure in what way the output of Napari differs from that of cellpose however this difference in file size was an absolute life saver for me and I guess It would be very helpful for anyone else that does not have access to a lot of RAM.
I would suggest to take a look at the implementation of Napari and see if it can be integrated into cellpose.

The workflow I used was the following (not the most efficient route I guess but it works for me):

import napari, cellpose,tifffile

model = models.CellposeModel(pretrained_model = cp_pretrained_model) #I used a custom model
masks_output_folder = "your_output_folder"

#segment the individual z-planes (each z-plane is in a seprate .tif file)
for i in range(num_z_planes):
     image = tifffile.imread(planes_output_folder + f'z_plane_{i}.tif')
     masks, flows, _ = model.eval(image)
     tifffile.imwrite(masks_output_folder + f'mask_{i}.tif', masks)
# -> results in the large cellpose output files

#convert the output into napari lable layer format = 56x filesize reduction
viewer = napari.Viewer() #opens a napari viewer
for  i in range(num_z_planes):
    labels_layer = viewer.open(f"mask_{i}_label.tif",layer_type = "labels") #load cellpose output mask file as label layer 
    labels_layer = viewer.layers[f"mask_{i}_label"] #rename the layer
    labels_layer.save(masks_output_folder + f"\mask_{i}_label.tif") #save the layer

#create a stack of the unstitched mask z-planes (i guess that could be skipped and directly stitched)
layer_data = [layer.data for layer in viewer.layers]
z_stack = np.stack(layer_data, axis=0)
viewer.add_image(z_stack, name="Z-stack")
viewer.layers["Z-stack""].save(masks_output_folder + f"\z-stack.tif")
viewer.close()

# Open the z-stack .tif file (for soem reason it did not want to load the whole stack directly probably somethign with my machine and it can be skipped. I will leave it here anyways
with Image.open(masks_output_folder + f"\z-stack.tif") as img:
    # Initialize a list to store image frames
    frames = []
    # Iterate over all frames in the .tif file
    for frame in range(img.n_frames):
        img.seek(frame)  # Move to the specific frame
        frames.append(np.array(img))  # Convert the frame to a NumPy array

# Convert the list of frames into a 3D NumPy array
stack = np.stack(frames, axis=0)

# Now, pass the 3D stack to your stitch3D function
stitched_stack = cellpose.utils.stitch3D(stack, stitch_threshold=0.6)

# Convert your stitched_stack to float32 if it's not already
stitched_stack = stitched_stack.astype(np.float32)

# Define the path for the output file
output_path = masks_output_folder + r'stitched_60%_stack_.tif'

# Convert the entire stitched stack to PIL Image objects with float32 mode
images = [Image.fromarray(slice, mode='F') for slice in stitched_stack]

# Save the multi-page TIFF
if images:
    images[0].save(output_path, save_all=True, append_images=images[1:], compression='tiff_deflate')
carsen-stringer commented 2 months ago

thanks for reporting this, what is the dtype for the frames saved with napari?

I think our tiffs are large because we weren't using compression, I've accepted a pull request with this. but that shouldn't change the underlying dtype

XylotrupesGideon commented 2 months ago

The dtype is np.ndarray after conversion.

I am still trying to figure out what exactly napari is doing to compress the labels.

XylotrupesGideon commented 2 months ago

Okay is seems that this is the function it uses : It seems to just create an integer ndarray and then populate it successively with the labels. though it goes throught some other functions which as far as I can see should however not influence the output of a cellpose segmentation.

napari/napari/layers/shapes /_shape_list.py

def to_labels(self, labels_shape=None, zoom_factor=1, offset=(0, 0)):
        """Returns a integer labels image, where each shape is embedded in an
        array of shape labels_shape with the value of the index + 1
        corresponding to it, and 0 for background. For overlapping shapes
        z-ordering will be respected.

        Parameters
        ----------
        labels_shape : np.ndarray | tuple | None
            2-tuple defining shape of labels image to be generated. If non
            specified, takes the max of all the vertices
        zoom_factor : float
            Premultiplier applied to coordinates before generating mask. Used
            for generating as downsampled mask.
        offset : 2-tuple
            Offset subtracted from coordinates before multiplying by the
            zoom_factor. Used for putting negative coordinates into the mask.

        Returns
        -------
        labels : np.ndarray
            MxP integer array where each value is either 0 for background or an
            integer up to N for points inside the corresponding shape.
        """
        if labels_shape is None:
            labels_shape = self.displayed_vertices.max(axis=0).astype(int)

        labels = np.zeros(labels_shape, dtype=int)

        for ind in self._z_order[::-1]:
            mask = self.shapes[ind].to_mask(
                labels_shape, zoom_factor=zoom_factor, offset=offset
            )
            labels[mask] = ind + 1

        return labels
carsen-stringer commented 2 months ago

thanks okay I'll see if I can replicate the increased memory usage and reduce it, we are using uint16 or uint32. which OS are you on and what were the dimensions of the stack (size in x,y,z)?

carsen-stringer commented 1 month ago

okay I am not using a big enough stack to replicate large differences in RAM but I found where it could be slowed down - due to a type cast. inside stitch3D we're using int, not the dtype of the masks from cellpose (which are usually uint16). I've updated the code to use masks.dtype, but I don't think should make much of a difference.

another alternative is that you have a bunch of small masks (<15 pixels) that are thrown out when not stitching but remain when stitching, and that's what is slowing things down. you can test this by turning off min_size when running plane-by-plane (model.eval(..., min_size=-1)) and seeing if you find a lot of small masks

carsen-stringer commented 2 weeks ago

going to close this for now due to inactivity, please upgrade to the latest cellpose for these features pip install git+https://github.com/mouseland/cellpose.git