Image labeling at FOV level

tcompa commented 2 years ago

Hi @gusqgm and @jluethi, let's discuss here the details of the image labeling task.

jluethi commented 2 years ago

To provide a simple first implementation of how to do this (more complicated, custom versions will follow later from @gusqgm):

Let's first implement nuclear segmentation using the cellpose library. Can be installed as pip install cellpose, installs some dependencies like PyTorch, but should just work via pip.

It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:

from cellpose import utils
from cellpose import models

def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3):
    use_GPU = models.use_gpu()
        # Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data). 
    # Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level
    with h5py.File(fn, 'a') as f:
        dapi_dset= f[input_channel][str(pyr_level)]
        model = models.Cellpose(gpu=use_GPU, model_type='nuclei')
        mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0],
                                                diameter=d, anisotropy=anisotropy, do_3D=True,
                                                net_avg=False, augment=False,
                                                cellprob_threshold=cellprob_th)

    # TODO: Save mask to whereever the output should go

Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.

And the label mask saved under mask above should then be saved to the OME-Zarr Labels folder, as described in the spec here: https://ngff.openmicroscopy.org/latest/

jluethi commented 2 years ago

Regarding saving images: I've tested this and it works well to save images site by site. If I have a label image, I can save it to an existing OME-Zarr file by doing this:

from ome_zarr.writer import write_labels

# Create label image somehow
label_img = mask  # e.g. the mask from above

# Path to the field of view we want to save to
file_path = f'/PATH/TO/PLATE/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/{fov}'

store = parse_url(file_path, mode="w").store
root = zarr.group(store=store)
label_name = "label_image"
label_axes=["z", "y", "x"]  # could change if e.g. a 2D image was processed => can we infer this?

write_labels(label_image, group=root, name=label_name, axes=label_axes)

This creates the necessary folder in the zarr file and writes the label image to disk with pyramids already. We may want to parse some additional parameters about the pyramids etc. (e.g. the coarsening factors), but the ome-zarr writer function for labels seems like a good start overall.

jluethi commented 2 years ago

We can view label images in the napari viewer using the napari-ome-zarr plugin: Screenshot 2022-06-14 at 16 20 01

Unfortunately, the visualization of labels does not seem to be working for HCS plates at the moment (see ongoing discussion here: https://github.com/ome/ome-zarr-py/issues/65#issuecomment-1010074621). But we can probably just proceed as planned for the moment and just check the label images by looking at single fovs.

jluethi commented 2 years ago

Also, all of this will be quite a bit simplified once we use multi-fovs (each site saved to an individual fov), see https://github.com/fractal-analytics-platform/mwe_fractal/issues/36 and here for the visualization issues https://github.com/fractal-analytics-platform/mwe_fractal/issues/66 => Let's make sure we push this through first, to avoid building an overly complex labeling task that needs to infer sites and save labels in chunks or such

tcompa commented 2 years ago

Quick question: which versions of cellpose (1 or 2) and python are you using?

jluethi commented 2 years ago

Let's go for version 2 now. We mostly used version 1 before, but version 2 has great improvements and we're certainly more interested in this going forward :)

tcompa commented 2 years ago

Ok, thanks.

I guess the bit of code in this issue was working with v1, right? Because for instance models.use_GPU() doesn't seem to exist (probably replaced by core.use_gpu(), I don't know if it's related to v1/v2 or to other updates on their side). @jluethi, is this code something that you can quickly run, to make sure it has no errors?

jluethi commented 2 years ago

@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow. Verifying that it runs on a GPU is trickier, as I can't do that with my local GPU, so same for whether the "runs on GPU" option works.

tcompa commented 2 years ago

@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow.

At the moment I'm only dealing with installing cellpose correctly, and making sure it runs on the GPU partition and sees the GPU. Anything related to code actually doing something is for later.

tcompa commented 2 years ago

It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:

from cellpose import utils
from cellpose import models

def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3):
    use_GPU = models.use_gpu()
        # Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data). 
    # Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level
    with h5py.File(fn, 'a') as f:
        dapi_dset= f[input_channel][str(pyr_level)]
        model = models.Cellpose(gpu=use_GPU, model_type='nuclei')
        mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0],
                                                diameter=d, anisotropy=anisotropy, do_3D=True,
                                                net_avg=False, augment=False,
                                                cellprob_threshold=cellprob_th)

    # TODO: Save mask to whereever the output should go

Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.

We'd need some more details:

What is the expected shape of dapi_dset? 2D or 3D? Single image or any size (e.g. merged FOVs)?
Could you please provide some meaningful values for the arguments of model.eval (i.e. d, anisotropy, cellprob_th)?

tcompa commented 2 years ago

(no worries,

We'd need some more details:

1. What is the expected shape of `dapi_dset`? 2D or 3D? Single image or any size (e.g. merged FOVs)?

2. Could you please provide some meaningful values for the arguments of `model.eval` (i.e. `d`, `anisotropy`, `cellprob_th`)?

I am just sticking with the defaults for the moment.

Next question is where to store label metadata in the zarr file. If we segment level 1, for instance, we should associate some coordinateTransformation (notably scale) to the output mask. I need to look into the specs, to see how to do it.

tcompa commented 2 years ago

Just reporting my observations: write_labels actually creates a pyramid (for the labels) by itself, and the relevant scale transformations are defined in plate.zarr/B/03/0/labels/label_image/.zattrs. The calculation of these shapes is in https://github.com/ome/ome-zarr-py/blob/313e12690fd95d4f199ef3aba9d9412730012c53/ome_zarr/format.py#L256-L266.

We should understand:

Whether this is in line with our pyramids, or something is different.
How to propagate a prefactor from write_labels down to generate_coordinate_transformation, so that the final scales match with the ones of the image at the corresponding level. This is if we segment a pyramid level which is not the highest-resolution one.

By now I modified the .zattrs file by hand, and obtained this

Screenshot from 2022-06-29 15-12-01

tcompa commented 2 years ago

I'm adding a prototype task image_labeling, with 5d45e9ae90d6e5b4e67b75ffbbfa68778083366a.

To run it (outside Fractal) I use this script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --partition=gpu
#SBATCH --time=10:00:00

date
echo
poetry run python image_labeling.py -z /data/active/fractal/tests/Temporary_data_UZH_1_well_2x2_sites_singlefov/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/
echo
date

The segmentation of the highest-resolution level (of a single 2x2 well, single channel --> shape 10, 4320, 5120) took around 10 minutes on the GPU.

tcompa commented 2 years ago

We should now get started with the discussion of what should be wrapped around this skeleton code. What inputs from the user? What possible outputs? What other behaviors should be supported?

jluethi commented 2 years ago

We'd need some more details:

What is the expected shape of dapi_dset? 2D or 3D? Single image or any size (e.g. merged FOVs)?

Could you please provide some meaningful values for the arguments of model.eval (i.e. d, anisotropy, cellprob_th)?

Both interesting questions where the answer may vary. We'd like to be able to use those networks both on 3D data as well as on 2D data like MIPs. Typically, the single cells would be segmented in 3D for each original FOV. But we would use the same (or a similar) model with slightly different parameters to segment organoids (larger structures) in the MIPs across sites.
d (diameter), cellprob_th (cell probablity threshold) and potentially 1 or 2 extra factors (will investigate) should definitely be user-exposed inputs and we'd run with the defaults it we don't get anything. Anisotropy is the ratio of x & y pixel sizes to z pixel sizes and we can get this from the metadata (scale). Not sure though if it needs to be rounded to an integer or can be a float (will investigate)

The question of at what resolution level the network should be run is a very good one! I actually ran the 3D model per site at pyramid level 1, because level 0 was too large for the GPU I had back then. This is something we may want to do for GPU memory or performance reasons at different instances, because we often don't need too fine a segmentation anyway and pyramid level 1 or even 2 may be detailed enough. => interesting that you ran at level 1, can we expose this as an option?

Also, I think we may just save the labels at the lowest pyramid level that we have (e.g. at level 1 if we don't have labels for level 0). For visualization, that probably works with the scale parameters set correctly. Not sure though if this will be a headache during analysis or if that is easily solved, so we may get back to this later.

write_labels actually creates a pyramid

Let's see if that is good enough for what we need or whether we'll need to have our own implementation. If we need to do it separately, let's be aware that creating a pyramid is different/trickier for label images than for intensity images. The downscaling function for a label image consisting of distinct integer labels shouldn't be the same as for an intensity image. Thus, if we can rely on the in-built functionality, that may be useful. We would need to parse our pyramid parameters to it though (i.e. downscaling factor, number of pyramid levels), not fully sure whether they will be directly taken as an input.

Your output already looks very promising! Did you run this on the MIP or on a single slice?

tcompa commented 2 years ago

Your output already looks very promising! Did you run this on the MIP or on a single slice?

It ran on the 3D image, with shape 10, 4320, 5120, via:

    # Select ZYX data for level 0 and channel 0
    dapi_dset = da.from_zarr(zarrurl + "/0")[0]

    use_gpu = core.use_gpu()
    model = models.Cellpose(gpu=use_gpu, model_type="nuclei")
    mask, flows, styles, diams = model.eval(
        dapi_dset, channels=[0, 0], do_3D=True, net_avg=False, augment=False
    )

jluethi commented 2 years ago

Neat! Impressive that the current cluster GPUs can handle this :) Generally, e.g. for the 9x8 cases, we'll need to run per fov for 3D models.

I have a meeting now, but will check for more of the details in the afternoon, so you should have more details on this by tomorrow

tcompa commented 2 years ago

Ok for the rest.

Briefly:

We should expose the choice of which channel/level to use.
We should support both 3D or 2D input.
We should expose the cellpose parameters.
We may compute anisotropy ourselves.

We should study the choice between custom or ome_zarr writing a bit further. If it wasn't for the trickier coarsening of labeling images, I'd be in favor of writing the pyramid by ourselves (which gives us full control on the scale transformations). In that case, there is no problem in doing labeling on (say) level 2, and then propagating it up and down in the pyramid with the correct scales. In the ome_zarr version I don't see any control we could have, so we may have to first write and then modify the scale parameters by hand. I think they don't even let us specify a coarsening factor, meaning that as soon as it is different from 2 their writer could be wrong.

More later.

jluethi commented 2 years ago

Great summary @tcompa !

My first view of ome_zarr pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later

tcompa commented 2 years ago

My first view of ome_zarr pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later

write_labels

write_labels(labels, .., scaler, .., coordinatetransformations, ..) calls:

_create_mip(labels, fmt, scaler, axes);
write_multiscale_labels(mip, .., coordinatetransformations, ..).

_create_mip

The first call of _create_mip is unclear to me, at the moment. I think that it creates a pyramid), rather than a MIP. Probably it's just a naming issue, and this is what I would rather call create_pyramid. Let's see how it looks like:

The scale.nearest call within _create_mip (which creates a pyramid using skimage.transform.resize and cvs2.INTER_NEAREST) uses the __nearest and _by_plane methods of the Scaler class. The Scaler class also has attributes

self.downscale (coarsening factor in XY plane);
self.max_layer (number of pyramid levels);

Thus a possible way to proceed is to define an ome_zarr scaler with the correct attributes:

our_scaler = ome_zarr.scaler.Scaler()
our_scaler.downscale = 3   # for instance
our-scaler.max_layer = 4    # for instance

write_multiscale_labels

Once the parameters for pyramid creation are set correctly set, then we still have to check coordinatetransformations. write_multiscale_labels calls

write_multiscale(pyramid, .., coordinate_transformations=coordinate_transformations, ..);
write_label_metadata(.., **({} if label_metadata is None else label_metadata)).

write_multiscale has an argument:

    :type coordinate_transformations: 2Dlist of dict, optional
    :param coordinate_transformations:
        List of transformations for each path.
        Each list of dicts are added to each datasets in order and must include a
        'scale' transform.

which we could use if our segmentation was not performed at the highest-resolution level. The question is: how would it mix with the default scale transformations? Would it just override it? Answer:

    if coordinate_transformations is None:
        shapes = [data.shape for data in pyramid]
        coordinate_transformations = fmt.generate_coordinate_transformations(shapes)

Thus it seems that in our image_labeling we can construct the correct set of scale transformation (taking into account the level on which we did segmentation), and let it propagate through ome_zarr functions. This is to be checked, of course.

Links: https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/writer.py https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/scale.py

First impression

It seems that we could achieve everything while sticking with ome_zarr, but it's not clear whether they have a special pyramid-creation procedure for labels. I think the last line of this method

    def __nearest(self, plane: np.ndarray, sizeY: int, sizeX: int) -> np.ndarray:
        """Apply the 2-dimensional transformation."""
        return resize(
            plane,
            output_shape=(sizeY // self.downscale, sizeX // self.downscale),
            order=0,
            preserve_range=True,
            anti_aliasing=False,
        ).astype(plane.dtype)

is the only one that matters: if labels are encoded in a very small int, the resulting downsampled levels will be cast back to the same type. This we could obviously also enforce in our pyramids.

If this is correct, then let's pick the one that is simpler or that gives us more control on what's happening.

tcompa commented 2 years ago

(previous comment went through too early, now updated)

jluethi commented 2 years ago

Thanks for the detailed analysis @tcompa

First thoughts on this: In the end, the ome-zarr implementation runs cvs2.INTER_NEAREST with anti_aliasing=False using the scikit image resizing. If we can easily do this or something equivalent, great :) Let's get to a first implementation we can test, we will notice very fast if labels are miss-assigned between pyramid levels!

If you think we can quickly have our own version, I have nothing against going that route. Would make it easier to scale eventually, if we want to write very large label images.

If our custom version with a similar setup is hard to implement, the way you describe passing the parameters to ome-zarr-py also sounds like a valid way to go so we can test the label image display :)

tcompa commented 2 years ago

Let's get to a first implementation we can test, we will notice very fast if labels are miss-assigned between pyramid levels!

I already checked, and they are indeed miss-aligned (when you do segmentation on level>0 and pass the resulting mask to write_labels without changing anything). We need to override their coordinatetransformations -- which is in principle doable and I'll try ASAP.

tcompa commented 2 years ago

As of b63c183560be76e699d9833fbe8a508c2afe618c, we have a preliminary working version that sets the correct scales and writes through ome_zarr.writer.

Segmentation is performed on a mid-res level (level 2), and scales are set as they should in plate.zarr/B/03/0/labels/label_image/.zattrs:

{
    "image-label": {
        "version": "0.4"
    },
    "multiscales": [
        {
            "axes": [
                {
                    "name": "z",
                    "type": "space"
                },
                {
                    "name": "y",
                    "type": "space"
                },
                {
                    "name": "x",
                    "type": "space"
                }
            ],
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                4.0,
                                4.0
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "0"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                8.0,
                                8.0
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "1"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                16.0,
                                16.0
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "2"
                },
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                32.0,
                                32.0
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "3"
                }
            ],
            "name": "label_image",
            "version": "0.4"
        }
    ]
}

Napari visualization confirms that the labels correspond to the image, also upon zooming in/out.

tcompa commented 2 years ago

The current choice (see code in https://github.com/fractal-analytics-platform/fractal/commit/b63c183560be76e699d9833fbe8a508c2afe618c) should be a bit more general than just applying the 1/coarsening^level approach, for cases where we built a pyramid with (e.g.) coarsening factor 3, and we had to trim some parts of the array because of non-commensurability.

However, at the moment we are defining the scales for the labels by looking at the shapes of the images. Depending on how ome_zarr constructs the pyramids (e.g. with downsampling factor 3, often leading to non-commensurability), we may have to fix the current approach.

This could be a bit annoying because the pyramids are built within write_labels, but we can find several ways to fix it.

tcompa commented 2 years ago

Rapid GPU/Cellpose benchmark (still with a standalone python script):

An array with shape (19, 2430, 2560) was segmented in ~3 minutes.

An array with shape (19, 4860, 5120) was too large:

Traceback (most recent call last):
  File "image_labeling.py", line 261, in <module>
  File "image_labeling.py", line 225, in image_labeling

  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 227, in eval
    masks, flows, styles = self.cp.eval(x, 
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 536, in eval
    masks, styles, dP, cellprob, p = self._run_cp(x, 
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 625, in _run_cp
    masks, p = dynamics.compute_masks(dP, cellprob, niter=niter, 
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/dynamics.py", line 718, in compute_masks
    mask = get_masks(p, iscell=cp_mask)
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/dynamics.py", line 636, in get_masks
    h,_ = np.histogramdd(tuple(pflows), bins=edges)
  File "<__array_function__ internals>", line 180, in histogramdd
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/lib/histograms.py", line 1074, in histogramdd
    Ncount = tuple(
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/lib/histograms.py", line 1076, in <genexpr>
    np.searchsorted(edges[i], sample[:, i], side='right')
  File "<__array_function__ internals>", line 180, in searchsorted
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1387, in searchsorted
    return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
  File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
numpy.core._exceptions.MemoryError: Unable to allocate 3.52 GiB for an array with shape (472780800,) and data type int64

jluethi commented 2 years ago

If the (19, 4860, 5120) case can be processed on our current cluster GPUs, then I'd say we're good to go forward to implement segmentation per original FOV (chunk at level 0 of the pyramid) at full resolution and we can worry about implementations at higher pyramid levels a bit later.

Priority order would then be (as just discussed in the call):

Segmentation at level 0 by site (original FOVs) in 2D & 3D
Segmentation for a full well, e.g. the full well in 2D (may still work at full resolution, i.e. pyramid level 0)
Segmentation at user-selected levels for 2D or 3D. e.g. Segment sites at pyramid level 1 or segment the whole well at level 2.

Let's make sure the first priority works and get this into Fractal. Then we can expand to priorities 2 & 3.

To enable case 3, we need to be able to get original FOVs in the well back. That would have been easy in a multi-FOV Zarr file. But I think we'll find good ways to do this in the single-FOV Zarr file as well. My current favorite path towards that: Save ROIs (region of interest, i.e. corner coordinates + sizes) in an OME-NGFF table. Then loop over all ROIs, extract the information of the relevant regions from the pyramidal chunks based on their position in a well and their scale parameter.

tcompa commented 2 years ago

If the (19, 4860, 5120) case can be processed on our current cluster GPUs, then I'd say we're good to go forward to implement segmentation per original FOV (chunk at level 0 of the pyramid) at full resolution and we can worry about implementations at higher pyramid levels a bit later.

Apparently it is already too large (see edited comment above).

jluethi commented 2 years ago

We've also started discussing the relabeling issue: When we run segmentation per chunk, each chunk has labels going from 1 to n. If the chunks are then combined again into a large image, the labels within the "well-image" are non-unique. If we want to map measurements to labels, we typically require unique labels per image though.

One solution to this is to "relabel" the different chunks to ensure label uniqueness. A description of the problem for dask and a relabeling strategy (though not for integer labels I think) can be found here: https://github.com/dask/dask/issues/2768

jluethi commented 2 years ago

Apparently it is already too large (see edited comment above).

Hmm, so we are close to the memory limit. But if (19, 2430, 2560) (not just (10, 2430, 2560)) still passes, we should be good to go for a first implementation :)

tcompa commented 2 years ago

But if (19, 2430, 2560) (not just (10, 2430, 2560)) still passes

It does pass.

tcompa commented 2 years ago

Possible issue: it looks like ome_zarr's writer only deals with numpy arrays, and not dask arrays. I think this means we won't be able to store labels for a single (say) 9x8 well with at least 10 Z planes.

Possible ideas:

Checking whether writer actually could be updated to support dask arrays. Warning: this wouldn't be a five-minutes fix, because writer is also connected to their other modules (scale, for instance). Plus we would first need green light from their devs, since it is quite an important change (essentially one should make all of ome_zarr dask-compatible, I guess).
We switch back to our own pyramid-creation utils.

I'll try 2 first, to see whether I can make it work quickly.

tcompa commented 2 years ago

Test code related to the previous comment:

import numpy as np
import shutil
import dask.array as da
import zarr

from ome_zarr.io import parse_url
from ome_zarr.writer import write_labels

store = parse_url("plate.zarr", mode="w").store
root = zarr.group(store=store)
label_name = "label_image"
label_axes = ["z", "y", "x"]  

mask = np.ones((10, 200, 200), dtype=np.uint16)
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the np array")

mask = da.ones((10, 200, 200), dtype=np.uint16).compute()
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the computed dask array")

mask = da.ones((10, 200, 200), dtype=np.uint16)
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the dask array")

with output/error:

Successfully written the np array
Successfully written the computed dask array
Traceback (most recent call last):
  File "test_ome_zarr_writer.py", line 29, in <module>
    write_labels(mask, group=root, name=label_name, axes=label_axes)
  File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 638, in write_labels
    write_multiscale_labels(
  File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 556, in write_multiscale_labels
    write_multiscale(
  File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 243, in write_multiscale
    group.create_dataset(str(path), data=data, chunks=chunks_opt, **options)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 821, in create_dataset
    return self._write_op(self._create_dataset_nosync, name, **kwargs)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 673, in _write_op
    return f(*args, **kwargs)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 838, in _create_dataset_nosync
    a = array(data, store=self._store, path=path, chunk_store=self._chunk_store,
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/creation.py", line 368, in array
    z[...] = data
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1285, in __setitem__
    self.set_basic_selection(pure_selection, value, fields=fields)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1380, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1678, in _set_basic_selection_nd
    indexer = BasicIndexer(selection, self)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 342, in __init__
    dim_indexer = SliceDimIndexer(dim_sel, dim_len, dim_chunk_len)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 176, in __init__
    self.nchunks = ceildiv(self.dim_len, self.dim_chunk_len)
  File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 160, in ceildiv
    return math.ceil(a / b)
TypeError: unsupported operand type(s) for /: 'int' and 'list'

jluethi commented 2 years ago

@tcompa Ah, makes sense! The whole OME-Zarr library isn't really optimized for out-of-core writing, which is why we have a lot of our own parsing to OME-Zarrs for the images as well. When we were still thinking about saving per original FOV, using the OME-Zarr library would have made sense.

But in the current framework of single-FOV wells, I think it makes more sense to create our own writer for labels now as well. Getting the OME-Zarr library to be out-of-core writing compatible is interesting, but a larger undertaking.

tcompa commented 2 years ago

A question about re-labeling:

Did I understand correctly that we could expect up to ~1000 labels in a single 3D column (say 10x2160x2560)? In that case, the total number of labels in a large well (say 72 sites or more) after re-labeling will be larger than what allowed by uint16, and we'll need to switch to uint32. Is that OK or should we discuss it further?

jluethi commented 2 years ago

For our current test cases, most wells contain ~50k cells, which would often still be below the limit. But there may be wells in the 23 well test case that actually go above this and other use cases where people have more than 65535 objects in a well.

If we can test uint32 label images and they work with the current pipeline, that should avoid the issue for any reasonable use-cases we use with label images in a well.

I haven't worked with uint32 label images in napari or ome-zarr before though, so don't yet know whether they would make any issues

jluethi commented 2 years ago

Size-wise, label images are typically very small, i.e. a few kbs for a chunk that is a few MBs, so even doubling bit depth shouldn't be an issue there. If you have a quick test to see that uint32 label images are still opened in napari, that would be the only check I think will be necessary. Let me know if you have a fast way to test this, otherwise I can test this on some example data this afternoon.

tcompa commented 2 years ago

Questions keep coming:

We are not yet parsing microscope metadata, so for the moment I will set the anisotropy default to the one that matters for current test datasets. From https://user-images.githubusercontent.com/18033446/174331876-c80c780f-95da-4c38-af6e-affa9e8d9c69.png and

        anisotropy: float (optional, default None)
            for 3D segmentation, optional rescaling factor (e.g. set to 2.0 if Z is sampled half as dense as X or Y)

I would say that anisotropy=5/0.325=15.38.

Does this number sound reasonable?

tcompa commented 2 years ago

If you have a quick test to see that uint32 label images are still opened in napari, that would be the only check I think will be necessary.

I opened some uint32 labels in napari (this is for the MIP of a 9x8 well, with ~40k labels in total) and everything seems to work.

Screenshot from 2022-07-04 12-22-11

jluethi commented 2 years ago

I would say that anisotropy=5/0.325=15.38.

How do we get to the 5 there? For this experiment, Z steps are 1um, xy dimension at full resolution is 0.1625um. At level 1 (with 2x coarsening), it's 0.325

So if we work at level 1, 1/0.325 ~= 3 is a reasonable choice (which is why I also had 3 as my original default value). Would be good if that's something the user can overwrite if needed (sometimes models work better by not actually using the physical anisotropy exactly)

I opened some uint32 labels in napari

This looks awesome! uint32 label images sound good to me in that case! :)

tcompa commented 2 years ago

How do we get to the 5 there?

I got it from the pixel_size_z column in the linked table, but then it was probably something else.

For this experiment, Z steps are 1um, xy dimension at full resolution is 0.1625um. At level 1 (with 2x coarsening), it's 0.325 So if we work at level 1, 1/0.325 ~= 3 is a reasonable choice (which is why I also had 3 as my original default value).

Since I'm working at level 0, I should set anisotropy~6, right?

Would be good if that's something the user can overwrite if needed (sometimes models work better by not actually using the physical anisotropy exactly)

Sure, that's the plan. More generally, we aim at very little hardcoded parameters (apart from very basic things like the image size).

jluethi commented 2 years ago

I got it from the pixel_size_z column in the linked table, but then it was probably something else.

Ah, that was from the FMI dataset, not the UZH Cardiomyocyte dataset we're currently training on. Yes, anisotropy~6 seems reasonable for this case.

Sure, that's the plan. More generally, we aim at very little hardcoded parameters (apart from very basic things like the image size).

Sounds great. Also, image size will not always be exactly those dimensions once we start adding more microscopes. Not that we need to work on that just yet, just so you're aware :)

tcompa commented 2 years ago

As of https://github.com/fractal-analytics-platform/fractal/commit/8e2998ad616941614962440002210617b39028da, there is a (very preliminary) version of Fractal integration. In principle it works (see examples/example_uzh_1_well_2x2_sites.sh), but there are ongoing discussions concerning scalability (both for memory and GPU resources).

I would not test this on the 23-wells yet, but probably one well with 9x8 sites should work.

In principle the task works both for 3D and 2D (MIP) data, but at the moment Fractal always runs on the 3D images.

More updates later.

tcompa commented 2 years ago

Since memory usage for the labeling task is going to be an important issue here, let's keep some reference info here. A line which is quite memory-expensive, in cellpose, is https://github.com/MouseLand/cellpose/blob/main/cellpose/core.py#L599:

yf = np.zeros((3, self.nclasses, imgs.shape[0], imgs.shape[1], imgs.shape[2]), np.float32)

As a reference, this could look like this:

$ python
Python 3.9.12 (main, Apr  5 2022, 06:56:58) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np

>>> x = np.random.randint(0, 100000, size=(3, 3, 10, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=497664000, size*itemsize=1990.656 GB

>>> x = np.random.randint(0, 100000, size=(3, 3, 15, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=746496000, size*itemsize=2985.984 GB

---- EDIT: more info

While running a single-well 9x8 case, I'm observing a moderate use of GPU memory, and an intensive use of CPU memory. This is a bit confusing, and requires more thoughts.

CPU memory: Screenshot from 2022-07-05 10-23-46

GPU memory: Screenshot from 2022-07-05 10-25-04

jluethi commented 2 years ago

@tcompa Are you processing a single original FOV (i.e. a single 2160x2560 chunk) and get those results? That is indeed interesting. It's not too worrying if cellpose also needs CPU memory though. I think it does quite a bit of computation to process different network outputs like flow maps & probability maps into label images. If that requires memory on the CPU-side, so be it :)

Plus, the memory usage may be quite cellpose specific. So as long as it runs, that should do the trick for the moment I'd think.

If we heavily rely on it afterwards in production, we can think about optimizing it further.

tcompa commented 2 years ago

Quick update on labeling/parallelization/memory (probably best to discuss this tomorrow)

TL;DR

We are currently running cellpose sequentially over sites, which is possibly slower but safer memory-wise;
We are currently storing the array of labels for a whole well, non-lazily. Memory-wise, this scales badly for larger wells, but at the moment we don't have a better way (due to a mix of relabeling needs and the previous point).

We can probably live with point 1, but point 2 doesn't look good. More work is needed.

More details

We are testing a single well with 9x8 sites in a single-FOV scheme. [Note: the single-FOV approach is the only one that we are currently supporting, as of recent decisions, and things won't change unless we explicitly change our mind (that is, we are not running anything multi-FOV).]

For labeling, at the moment we go through the 72 sites sequentially, i.e. there are never two cellpose calculations running at the same time. Early experiments with more flexible approaches (letting dask organize the 72 parallel executions, through map_blocks or similar, including https://github.com/dask/dask/issues/2768) led to memory errors within cellpose (for instance in the line mentioned above), and our obvious guess is that this was due to simultaneous execution of more than one cellpose calculation.

At the moment, we are not able to generate the array of labels of a whole well in a lazy way, and that's the reason for the large CPU-memory usage. For a well of 72 sites with 19 Z planes, this array takes around 30 GB (for uint32 labels). Which is still doable on our current RAM (64 G), but not really scalable. A quick mitigation is to switch back to uint16 labels, and only use uint32 when the number of labels becomes larger than 65k. But we would be just pushing the problem a bit further, not solving it.

If we dropped relabeling (which is not an option, in our understanding), we could try to construct this array lazily, by just:

Execute cellpose for chunk 1/72 (first column of 3D data), and store it on disk;
Execute cellpose for chunk 2/72 (second column of 3D data), and store it on disk;
...

This would still have two problems (1. concurrent execution of cellpose calculations can lead to memory errors, 2. lack of relabeling), but at least we would not be building the whole 9x8x19x2160x2560 array.

jluethi commented 2 years ago

Great summary @tcompa, let's discuss in detail tomorrow. Some short notes:

If we dropped relabeling (which is not an option, in our understanding)

Dropping relabeling would be a pity, the relabeling makes this quite useful. But maybe it's still worth it to save non-relabeled label values first per chunk, potentially running on multiple nodes. And then have a collection task that changes the label images to relabeled? Would be IO overhead, but typically label images are fairly small on disk.

In that case, we can then either run this job on CPU nodes with high memory (much cheaper than blocking GPU nodes and we would only have high memory demand for a short time) or eventually figure out ways to do lazy relabeling, e.g. based on the logging information about counts of objects per fov.

Let's discuss tomorrow whether that's worth the complexity.

tcompa commented 2 years ago

Some thoughts as of yesterday's Fractal meeting:

A first useful update would be to separate labeling and relabeling.
- In this way, it would be simpler to make the labeling part lazy, with great benefits in terms of CPU memory.
- Missing: we still need to limit the number of simultaneous Cellpose calls on the same GPU, but how to do it from dask is currently unclear. Possibly useful: https://github.com/dask/distributed/issues/800, https://github.com/rapidsai/cuml/issues/1822. If it's not easily doable, we can create small batches of (say) four sites and execute them together.
- The relabeling is probably something which can be done naively (label files are not so large), and it should also be optional.
The next required use case is to run segmentation on a whole well, for 2D data (e.g. MIP). This will require:
- A different structure of the task (re: what we parallelize over). See also point 3.
- The possibility of working at lower-resolution levels (most likely).
  1. If/when we implement the ROI definition as in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/24, the task can be restructured in a way that includes both use cases (parallelized over sites, or directly on the whole well, at whatever resolution level), and opens the door to parallelization over arbitrary ROIs.

tcompa commented 2 years ago

Missing: we still need to limit the number of simultaneous Cellpose calls on the same GPU, but how to do it from dask is currently unclear.

Probably useful: https://docs.dask.org/en/stable/scheduler-overview.html#configuring-the-schedulers

Here's a minimal working example on a CPU:

mport time
import numpy as np
import dask
import dask.array as da
from concurrent.futures import ThreadPoolExecutor

n_threads = 4

def heavy_cellpose_function(block, block_info=None):
    print(f"Now running heavy cellpose function for chunk {block_info[None]['chunk-location']}")
    time.sleep(10)
    return np.ones_like(block, dtype=np.uint16) * np.random.randint(10)
x = da.zeros((8, 12), chunks=(4, 3), dtype=np.uint16)

# Extract total number of chunks
n_chunks = (x.shape[0] // x.chunksize[0]) * (x.shape[1] // x.chunksize[1])
print(f"Number of chunks: {n_chunks}")
print(f"Number of threads: {n_threads}")
print()

t0 = time.perf_counter()
with dask.config.set(pool=ThreadPoolExecutor(n_threads)):
    y = x.map_blocks(heavy_cellpose_function, chunks=(4, 3), meta=np.array((), dtype=np.uint16))
    y.to_zarr("data.zarr")
t1 = time.perf_counter()
print()
print(f"Elapsed time: {t1 - t0}")
print()

with output:

Number of chunks: 8
Number of threads: 4

Now running heavy cellpose function for chunk (0, 0)
Now running heavy cellpose function for chunk (0, 1)
Now running heavy cellpose function for chunk (0, 2)
Now running heavy cellpose function for chunk (0, 3)
Now running heavy cellpose function for chunk (1, 0)
Now running heavy cellpose function for chunk (1, 1)
Now running heavy cellpose function for chunk (1, 2)
Now running heavy cellpose function for chunk (1, 3)

Elapsed time: 20.044721081001626

Now we should check whether it also works on the GPU.

tcompa commented 2 years ago

Still on "running cellpose for several sites at the same time on the GPU":

We consider sites of shape (10,2160,2560), and we limit the number of concurrent cellpose executions as in https://github.com/fractal-analytics-platform/fractal/issues/64#issuecomment-1177229164.

We run the test on a 2x2 well, but we only count the sites which are actually run in parallel (i.e. if we have 4=3+1 sites and run them in batches of 3, we only look at the timing for the first three and we ignore the last one). This is the only relevant measure when scaling to large wells.

batch_size	time_per_site	time_per_site / batch_size	Speed-up factor
1	~4 min	~4 min	1
2	~5 min	~2.5 min	~1.6
3	~6 min	~2 min	~2
4	MemoryError	-	-

This suggests that:

It is beneficial to add a little bit of same-GPU parallelism.
This wont' scale to anything more than small batches of 2-3 sites. And for larger datasets (i.e. more Z planes), the maximum allowed batch_size could even be one.

fractal-analytics-platform / fractal-client