Closed tcompa closed 2 years ago
To provide a simple first implementation of how to do this (more complicated, custom versions will follow later from @gusqgm):
Let's first implement nuclear segmentation using the cellpose library. Can be installed as pip install cellpose
, installs some dependencies like PyTorch, but should just work via pip.
It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:
from cellpose import utils
from cellpose import models
def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3):
use_GPU = models.use_gpu()
# Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data).
# Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level
with h5py.File(fn, 'a') as f:
dapi_dset= f[input_channel][str(pyr_level)]
model = models.Cellpose(gpu=use_GPU, model_type='nuclei')
mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0],
diameter=d, anisotropy=anisotropy, do_3D=True,
net_avg=False, augment=False,
cellprob_threshold=cellprob_th)
# TODO: Save mask to whereever the output should go
Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.
And the label mask saved under mask
above should then be saved to the OME-Zarr Labels folder, as described in the spec here: https://ngff.openmicroscopy.org/latest/
Regarding saving images: I've tested this and it works well to save images site by site. If I have a label image, I can save it to an existing OME-Zarr file by doing this:
from ome_zarr.writer import write_labels
# Create label image somehow
label_img = mask # e.g. the mask from above
# Path to the field of view we want to save to
file_path = f'/PATH/TO/PLATE/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/{fov}'
store = parse_url(file_path, mode="w").store
root = zarr.group(store=store)
label_name = "label_image"
label_axes=["z", "y", "x"] # could change if e.g. a 2D image was processed => can we infer this?
write_labels(label_image, group=root, name=label_name, axes=label_axes)
This creates the necessary folder in the zarr file and writes the label image to disk with pyramids already. We may want to parse some additional parameters about the pyramids etc. (e.g. the coarsening factors), but the ome-zarr writer function for labels seems like a good start overall.
We can view label images in the napari viewer using the napari-ome-zarr plugin:
Unfortunately, the visualization of labels does not seem to be working for HCS plates at the moment (see ongoing discussion here: https://github.com/ome/ome-zarr-py/issues/65#issuecomment-1010074621). But we can probably just proceed as planned for the moment and just check the label images by looking at single fovs.
Also, all of this will be quite a bit simplified once we use multi-fovs (each site saved to an individual fov), see https://github.com/fractal-analytics-platform/mwe_fractal/issues/36 and here for the visualization issues https://github.com/fractal-analytics-platform/mwe_fractal/issues/66 => Let's make sure we push this through first, to avoid building an overly complex labeling task that needs to infer sites and save labels in chunks or such
Quick question: which versions of cellpose (1 or 2) and python are you using?
Let's go for version 2 now. We mostly used version 1 before, but version 2 has great improvements and we're certainly more interested in this going forward :)
Ok, thanks.
I guess the bit of code in this issue was working with v1, right? Because for instance models.use_GPU()
doesn't seem to exist (probably replaced by core.use_gpu()
, I don't know if it's related to v1/v2 or to other updates on their side).
@jluethi, is this code something that you can quickly run, to make sure it has no errors?
@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow. Verifying that it runs on a GPU is trickier, as I can't do that with my local GPU, so same for whether the "runs on GPU" option works.
@tcompa Ah, yes, can be that there were some changes in that setting. If you have code that runs, I can test it over lunch quickly or later tonight. Otherwise, I can look into providing an updated example that I verified separately by tomorrow.
At the moment I'm only dealing with installing cellpose correctly, and making sure it runs on the GPU partition and sees the GPU. Anything related to code actually doing something is for later.
It works well to segment nuclei in the Pelkmans lab test sets I provided. Here is example code on how to use it:
from cellpose import utils from cellpose import models def segment_nuclei(fn, input_channel, cellprob_th=0, d=40, pyr_level=1, anisotropy=3): use_GPU = models.use_gpu() # Get the relevant channel (make this something the user can select). In our case, it's the DAPI channel (e.g. C01 in the original data). # Example code is to load it from a hdf5 file that's saved at fn, loading the channel input_channel and a given pyramid level with h5py.File(fn, 'a') as f: dapi_dset= f[input_channel][str(pyr_level)] model = models.Cellpose(gpu=use_GPU, model_type='nuclei') mask, flows, styles, diams = model.eval(dapi_dset, channels=[0,0], diameter=d, anisotropy=anisotropy, do_3D=True, net_avg=False, augment=False, cellprob_threshold=cellprob_th) # TODO: Save mask to whereever the output should go
Probably best to load e.g. pyramid level 1 (let's make this an option, but start with 1 for the moment). Processing full res images through cellpose poses quite a high memory demand to the GPU, which is why we typically used downsampled images here.
We'd need some more details:
dapi_dset
? 2D or 3D? Single image or any size (e.g. merged FOVs)?model.eval
(i.e. d
, anisotropy
, cellprob_th
)?(no worries,
We'd need some more details:
1. What is the expected shape of `dapi_dset`? 2D or 3D? Single image or any size (e.g. merged FOVs)? 2. Could you please provide some meaningful values for the arguments of `model.eval` (i.e. `d`, `anisotropy`, `cellprob_th`)?
I am just sticking with the defaults for the moment.
Next question is where to store label metadata in the zarr file. If we segment level 1
, for instance, we should associate some coordinateTransformation (notably scale
) to the output mask. I need to look into the specs, to see how to do it.
Just reporting my observations: write_labels
actually creates a pyramid (for the labels) by itself, and the relevant scale
transformations are defined in plate.zarr/B/03/0/labels/label_image/.zattrs
. The calculation of these shapes is in https://github.com/ome/ome-zarr-py/blob/313e12690fd95d4f199ef3aba9d9412730012c53/ome_zarr/format.py#L256-L266.
We should understand:
write_labels
down to generate_coordinate_transformation
, so that the final scales match with the ones of the image at the corresponding level. This is if we segment a pyramid level which is not the highest-resolution one.By now I modified the .zattrs
file by hand, and obtained this
I'm adding a prototype task image_labeling
, with 5d45e9ae90d6e5b4e67b75ffbbfa68778083366a.
To run it (outside Fractal) I use this script
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --partition=gpu
#SBATCH --time=10:00:00
date
echo
poetry run python image_labeling.py -z /data/active/fractal/tests/Temporary_data_UZH_1_well_2x2_sites_singlefov/20200812-CardiomyocyteDifferentiation14-Cycle1.zarr/B/03/0/
echo
date
The segmentation of the highest-resolution level (of a single 2x2 well, single channel --> shape 10, 4320, 5120
) took around 10 minutes on the GPU.
We should now get started with the discussion of what should be wrapped around this skeleton code. What inputs from the user? What possible outputs? What other behaviors should be supported?
We'd need some more details:
- What is the expected shape of dapi_dset? 2D or 3D? Single image or any size (e.g. merged FOVs)?
- Could you please provide some meaningful values for the arguments of model.eval (i.e. d, anisotropy, cellprob_th)?
The question of at what resolution level the network should be run is a very good one! I actually ran the 3D model per site at pyramid level 1, because level 0 was too large for the GPU I had back then. This is something we may want to do for GPU memory or performance reasons at different instances, because we often don't need too fine a segmentation anyway and pyramid level 1 or even 2 may be detailed enough. => interesting that you ran at level 1, can we expose this as an option?
Also, I think we may just save the labels at the lowest pyramid level that we have (e.g. at level 1 if we don't have labels for level 0). For visualization, that probably works with the scale parameters set correctly. Not sure though if this will be a headache during analysis or if that is easily solved, so we may get back to this later.
write_labels actually creates a pyramid
Let's see if that is good enough for what we need or whether we'll need to have our own implementation. If we need to do it separately, let's be aware that creating a pyramid is different/trickier for label images than for intensity images. The downscaling function for a label image consisting of distinct integer labels shouldn't be the same as for an intensity image. Thus, if we can rely on the in-built functionality, that may be useful. We would need to parse our pyramid parameters to it though (i.e. downscaling factor, number of pyramid levels), not fully sure whether they will be directly taken as an input.
Your output already looks very promising! Did you run this on the MIP or on a single slice?
Your output already looks very promising! Did you run this on the MIP or on a single slice?
It ran on the 3D image, with shape 10, 4320, 5120
, via:
# Select ZYX data for level 0 and channel 0
dapi_dset = da.from_zarr(zarrurl + "/0")[0]
use_gpu = core.use_gpu()
model = models.Cellpose(gpu=use_gpu, model_type="nuclei")
mask, flows, styles, diams = model.eval(
dapi_dset, channels=[0, 0], do_3D=True, net_avg=False, augment=False
)
Neat! Impressive that the current cluster GPUs can handle this :) Generally, e.g. for the 9x8 cases, we'll need to run per fov for 3D models.
I have a meeting now, but will check for more of the details in the afternoon, so you should have more details on this by tomorrow
Ok for the rest.
Briefly:
We should study the choice between custom or ome_zarr
writing a bit further. If it wasn't for the trickier coarsening of labeling images, I'd be in favor of writing the pyramid by ourselves (which gives us full control on the scale
transformations). In that case, there is no problem in doing labeling on (say) level 2, and then propagating it up and down in the pyramid with the correct scales.
In the ome_zarr
version I don't see any control we could have, so we may have to first write and then modify the scale
parameters by hand. I think they don't even let us specify a coarsening factor, meaning that as soon as it is different from 2
their writer could be wrong.
More later.
Great summary @tcompa !
My first view of ome_zarr
pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later
My first view of
ome_zarr
pyramidal code doesn't actually show any special treatment for label image downsampling, so maybe they don't handle it well either. I'll have a look at this later
write_labels(labels, .., scaler, .., coordinatetransformations, ..)
calls:
_create_mip(labels, fmt, scaler, axes)
;write_multiscale_labels(mip, .., coordinatetransformations, ..)
.The first call of _create_mip
is unclear to me, at the moment. I think that it creates a pyramid), rather than a MIP. Probably it's just a naming issue, and this is what I would rather call create_pyramid
. Let's see how it looks like:
The scale.nearest
call within _create_mip
(which creates a pyramid using skimage.transform.resize
and cvs2.INTER_NEAREST
) uses the __nearest
and _by_plane
methods of the Scaler
class. The Scaler
class also has attributes
self.downscale
(coarsening factor in XY plane);self.max_layer
(number of pyramid levels);Thus a possible way to proceed is to define an ome_zarr
scaler with the correct attributes:
our_scaler = ome_zarr.scaler.Scaler()
our_scaler.downscale = 3 # for instance
our-scaler.max_layer = 4 # for instance
Once the parameters for pyramid creation are set correctly set, then we still have to check coordinatetransformations
.
write_multiscale_labels
calls
write_multiscale(pyramid, .., coordinate_transformations=coordinate_transformations, ..)
;write_label_metadata(.., **({} if label_metadata is None else label_metadata))
.write_multiscale
has an argument:
:type coordinate_transformations: 2Dlist of dict, optional
:param coordinate_transformations:
List of transformations for each path.
Each list of dicts are added to each datasets in order and must include a
'scale' transform.
which we could use if our segmentation was not performed at the highest-resolution level. The question is: how would it mix with the default scale
transformations? Would it just override it?
Answer:
if coordinate_transformations is None:
shapes = [data.shape for data in pyramid]
coordinate_transformations = fmt.generate_coordinate_transformations(shapes)
Thus it seems that in our image_labeling
we can construct the correct set of scale
transformation (taking into account the level on which we did segmentation), and let it propagate through ome_zarr
functions. This is to be checked, of course.
Links: https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/writer.py https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/scale.py
It seems that we could achieve everything while sticking with ome_zarr
, but it's not clear whether they have a special pyramid-creation procedure for labels. I think the last line of this method
def __nearest(self, plane: np.ndarray, sizeY: int, sizeX: int) -> np.ndarray:
"""Apply the 2-dimensional transformation."""
return resize(
plane,
output_shape=(sizeY // self.downscale, sizeX // self.downscale),
order=0,
preserve_range=True,
anti_aliasing=False,
).astype(plane.dtype)
is the only one that matters: if labels are encoded in a very small int
, the resulting downsampled levels will be cast back to the same type. This we could obviously also enforce in our pyramids.
If this is correct, then let's pick the one that is simpler or that gives us more control on what's happening.
(previous comment went through too early, now updated)
Thanks for the detailed analysis @tcompa
First thoughts on this:
In the end, the ome-zarr
implementation runs cvs2.INTER_NEAREST
with anti_aliasing=False
using the scikit image resizing. If we can easily do this or something equivalent, great :) Let's get to a first implementation we can test, we will notice very fast if labels are miss-assigned between pyramid levels!
If you think we can quickly have our own version, I have nothing against going that route. Would make it easier to scale eventually, if we want to write very large label images.
If our custom version with a similar setup is hard to implement, the way you describe passing the parameters to ome-zarr-py
also sounds like a valid way to go so we can test the label image display :)
Let's get to a first implementation we can test, we will notice very fast if labels are miss-assigned between pyramid levels!
I already checked, and they are indeed miss-aligned (when you do segmentation on level>0
and pass the resulting mask to write_labels
without changing anything). We need to override their coordinatetransformations
-- which is in principle doable and I'll try ASAP.
As of b63c183560be76e699d9833fbe8a508c2afe618c, we have a preliminary working version that sets the correct scales and writes through ome_zarr.writer
.
Segmentation is performed on a mid-res level (level 2), and scales are set as they should in plate.zarr/B/03/0/labels/label_image/.zattrs
:
{
"image-label": {
"version": "0.4"
},
"multiscales": [
{
"axes": [
{
"name": "z",
"type": "space"
},
{
"name": "y",
"type": "space"
},
{
"name": "x",
"type": "space"
}
],
"datasets": [
{
"coordinateTransformations": [
{
"scale": [
1.0,
4.0,
4.0
],
"type": "scale"
}
],
"path": "0"
},
{
"coordinateTransformations": [
{
"scale": [
1.0,
8.0,
8.0
],
"type": "scale"
}
],
"path": "1"
},
{
"coordinateTransformations": [
{
"scale": [
1.0,
16.0,
16.0
],
"type": "scale"
}
],
"path": "2"
},
{
"coordinateTransformations": [
{
"scale": [
1.0,
32.0,
32.0
],
"type": "scale"
}
],
"path": "3"
}
],
"name": "label_image",
"version": "0.4"
}
]
}
Napari visualization confirms that the labels correspond to the image, also upon zooming in/out.
The current choice (see code in https://github.com/fractal-analytics-platform/fractal/commit/b63c183560be76e699d9833fbe8a508c2afe618c) should be a bit more general than just applying the 1/coarsening^level
approach, for cases where we built a pyramid with (e.g.) coarsening factor 3, and we had to trim some parts of the array because of non-commensurability.
However, at the moment we are defining the scales for the labels by looking at the shapes of the images. Depending on how ome_zarr
constructs the pyramids (e.g. with downsampling factor 3, often leading to non-commensurability), we may have to fix the current approach.
This could be a bit annoying because the pyramids are built within write_labels
, but we can find several ways to fix it.
Rapid GPU/Cellpose benchmark (still with a standalone python script):
An array with shape (19, 2430, 2560)
was segmented in ~3 minutes.
An array with shape (19, 4860, 5120)
was too large:
Traceback (most recent call last):
File "image_labeling.py", line 261, in <module>
File "image_labeling.py", line 225, in image_labeling
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 227, in eval
masks, flows, styles = self.cp.eval(x,
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 536, in eval
masks, styles, dP, cellprob, p = self._run_cp(x,
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/models.py", line 625, in _run_cp
masks, p = dynamics.compute_masks(dP, cellprob, niter=niter,
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/dynamics.py", line 718, in compute_masks
mask = get_masks(p, iscell=cp_mask)
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/cellpose/dynamics.py", line 636, in get_masks
h,_ = np.histogramdd(tuple(pflows), bins=edges)
File "<__array_function__ internals>", line 180, in histogramdd
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/lib/histograms.py", line 1074, in histogramdd
Ncount = tuple(
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/lib/histograms.py", line 1076, in <genexpr>
np.searchsorted(edges[i], sample[:, i], side='right')
File "<__array_function__ internals>", line 180, in searchsorted
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1387, in searchsorted
return _wrapfunc(a, 'searchsorted', v, side=side, sorter=sorter)
File "/data/homes/fractal/.cache/pypoetry/virtualenvs/fractal-vy7-1Rxp-py3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
numpy.core._exceptions.MemoryError: Unable to allocate 3.52 GiB for an array with shape (472780800,) and data type int64
If the (19, 4860, 5120)
case can be processed on our current cluster GPUs, then I'd say we're good to go forward to implement segmentation per original FOV (chunk at level 0 of the pyramid) at full resolution and we can worry about implementations at higher pyramid levels a bit later.
Priority order would then be (as just discussed in the call):
Let's make sure the first priority works and get this into Fractal. Then we can expand to priorities 2 & 3.
To enable case 3, we need to be able to get original FOVs in the well back. That would have been easy in a multi-FOV Zarr file. But I think we'll find good ways to do this in the single-FOV Zarr file as well. My current favorite path towards that: Save ROIs (region of interest, i.e. corner coordinates + sizes) in an OME-NGFF table. Then loop over all ROIs, extract the information of the relevant regions from the pyramidal chunks based on their position in a well and their scale parameter.
If the
(19, 4860, 5120)
case can be processed on our current cluster GPUs, then I'd say we're good to go forward to implement segmentation per original FOV (chunk at level 0 of the pyramid) at full resolution and we can worry about implementations at higher pyramid levels a bit later.
Apparently it is already too large (see edited comment above).
We've also started discussing the relabeling issue: When we run segmentation per chunk, each chunk has labels going from 1 to n. If the chunks are then combined again into a large image, the labels within the "well-image" are non-unique. If we want to map measurements to labels, we typically require unique labels per image though.
One solution to this is to "relabel" the different chunks to ensure label uniqueness. A description of the problem for dask and a relabeling strategy (though not for integer labels I think) can be found here: https://github.com/dask/dask/issues/2768
Apparently it is already too large (see edited comment above).
Hmm, so we are close to the memory limit. But if (19, 2430, 2560)
(not just (10, 2430, 2560)
) still passes, we should be good to go for a first implementation :)
But if
(19, 2430, 2560)
(not just(10, 2430, 2560)
) still passes
It does pass.
Possible issue: it looks like ome_zarr
's writer
only deals with numpy arrays, and not dask arrays. I think this means we won't be able to store labels for a single (say) 9x8 well with at least 10 Z planes.
Possible ideas:
writer
actually could be updated to support dask arrays. Warning: this wouldn't be a five-minutes fix, because writer
is also connected to their other modules (scale
, for instance). Plus we would first need green light from their devs, since it is quite an important change (essentially one should make all of ome_zarr
dask-compatible, I guess).I'll try 2 first, to see whether I can make it work quickly.
Test code related to the previous comment:
import numpy as np
import shutil
import dask.array as da
import zarr
from ome_zarr.io import parse_url
from ome_zarr.writer import write_labels
store = parse_url("plate.zarr", mode="w").store
root = zarr.group(store=store)
label_name = "label_image"
label_axes = ["z", "y", "x"]
mask = np.ones((10, 200, 200), dtype=np.uint16)
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the np array")
mask = da.ones((10, 200, 200), dtype=np.uint16).compute()
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the computed dask array")
mask = da.ones((10, 200, 200), dtype=np.uint16)
shutil.rmtree("plate.zarr")
write_labels(mask, group=root, name=label_name, axes=label_axes)
print("Successfully written the dask array")
with output/error:
Successfully written the np array
Successfully written the computed dask array
Traceback (most recent call last):
File "test_ome_zarr_writer.py", line 29, in <module>
write_labels(mask, group=root, name=label_name, axes=label_axes)
File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 638, in write_labels
write_multiscale_labels(
File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 556, in write_multiscale_labels
write_multiscale(
File "/home/tommaso/Fractal/ome-zarr-py/ome_zarr/writer.py", line 243, in write_multiscale
group.create_dataset(str(path), data=data, chunks=chunks_opt, **options)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 821, in create_dataset
return self._write_op(self._create_dataset_nosync, name, **kwargs)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 673, in _write_op
return f(*args, **kwargs)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/hierarchy.py", line 838, in _create_dataset_nosync
a = array(data, store=self._store, path=path, chunk_store=self._chunk_store,
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/creation.py", line 368, in array
z[...] = data
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1285, in __setitem__
self.set_basic_selection(pure_selection, value, fields=fields)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1380, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/core.py", line 1678, in _set_basic_selection_nd
indexer = BasicIndexer(selection, self)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 342, in __init__
dim_indexer = SliceDimIndexer(dim_sel, dim_len, dim_chunk_len)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 176, in __init__
self.nchunks = ceildiv(self.dim_len, self.dim_chunk_len)
File "/home/tommaso/miniconda3/envs/fractal/lib/python3.8/site-packages/zarr/indexing.py", line 160, in ceildiv
return math.ceil(a / b)
TypeError: unsupported operand type(s) for /: 'int' and 'list'
@tcompa Ah, makes sense! The whole OME-Zarr library isn't really optimized for out-of-core writing, which is why we have a lot of our own parsing to OME-Zarrs for the images as well. When we were still thinking about saving per original FOV, using the OME-Zarr library would have made sense.
But in the current framework of single-FOV wells, I think it makes more sense to create our own writer for labels now as well. Getting the OME-Zarr library to be out-of-core writing compatible is interesting, but a larger undertaking.
A question about re-labeling:
Did I understand correctly that we could expect up to ~1000 labels in a single 3D column (say 10x2160x2560
)?
In that case, the total number of labels in a large well (say 72 sites or more) after re-labeling will be larger than what allowed by uint16
, and we'll need to switch to uint32
. Is that OK or should we discuss it further?
For our current test cases, most wells contain ~50k cells, which would often still be below the limit. But there may be wells in the 23 well test case that actually go above this and other use cases where people have more than 65535 objects in a well.
If we can test uint32
label images and they work with the current pipeline, that should avoid the issue for any reasonable use-cases we use with label images in a well.
I haven't worked with uint32
label images in napari or ome-zarr before though, so don't yet know whether they would make any issues
Size-wise, label images are typically very small, i.e. a few kbs for a chunk that is a few MBs, so even doubling bit depth shouldn't be an issue there. If you have a quick test to see that uint32 label images are still opened in napari, that would be the only check I think will be necessary. Let me know if you have a fast way to test this, otherwise I can test this on some example data this afternoon.
Questions keep coming:
We are not yet parsing microscope metadata, so for the moment I will set the anisotropy
default to the one that matters for current test datasets. From https://user-images.githubusercontent.com/18033446/174331876-c80c780f-95da-4c38-af6e-affa9e8d9c69.png and
anisotropy: float (optional, default None)
for 3D segmentation, optional rescaling factor (e.g. set to 2.0 if Z is sampled half as dense as X or Y)
I would say that anisotropy=5/0.325=15.38
.
Does this number sound reasonable?
If you have a quick test to see that uint32 label images are still opened in napari, that would be the only check I think will be necessary.
I opened some uint32 labels in napari (this is for the MIP of a 9x8 well, with ~40k labels in total) and everything seems to work.
I would say that anisotropy=5/0.325=15.38.
How do we get to the 5 there? For this experiment, Z steps are 1um, xy dimension at full resolution is 0.1625um. At level 1 (with 2x coarsening), it's 0.325
So if we work at level 1, 1/0.325 ~= 3 is a reasonable choice (which is why I also had 3 as my original default value). Would be good if that's something the user can overwrite if needed (sometimes models work better by not actually using the physical anisotropy exactly)
I opened some uint32 labels in napari
This looks awesome! uint32
label images sound good to me in that case! :)
How do we get to the 5 there?
I got it from the pixel_size_z
column in the linked table, but then it was probably something else.
For this experiment, Z steps are 1um, xy dimension at full resolution is 0.1625um. At level 1 (with 2x coarsening), it's 0.325 So if we work at level 1, 1/0.325 ~= 3 is a reasonable choice (which is why I also had 3 as my original default value).
Since I'm working at level 0, I should set anisotropy~6
, right?
Would be good if that's something the user can overwrite if needed (sometimes models work better by not actually using the physical anisotropy exactly)
Sure, that's the plan. More generally, we aim at very little hardcoded parameters (apart from very basic things like the image size).
I got it from the pixel_size_z column in the linked table, but then it was probably something else.
Ah, that was from the FMI dataset, not the UZH Cardiomyocyte dataset we're currently training on.
Yes, anisotropy~6
seems reasonable for this case.
Sure, that's the plan. More generally, we aim at very little hardcoded parameters (apart from very basic things like the image size).
Sounds great. Also, image size will not always be exactly those dimensions once we start adding more microscopes. Not that we need to work on that just yet, just so you're aware :)
As of https://github.com/fractal-analytics-platform/fractal/commit/8e2998ad616941614962440002210617b39028da, there is a (very preliminary) version of Fractal integration. In principle it works (see examples/example_uzh_1_well_2x2_sites.sh
), but there are ongoing discussions concerning scalability (both for memory and GPU resources).
I would not test this on the 23-wells yet, but probably one well with 9x8 sites should work.
In principle the task works both for 3D and 2D (MIP) data, but at the moment Fractal always runs on the 3D images.
More updates later.
Since memory usage for the labeling task is going to be an important issue here, let's keep some reference info here. A line which is quite memory-expensive, in cellpose, is https://github.com/MouseLand/cellpose/blob/main/cellpose/core.py#L599:
yf = np.zeros((3, self.nclasses, imgs.shape[0], imgs.shape[1], imgs.shape[2]), np.float32)
As a reference, this could look like this:
$ python
Python 3.9.12 (main, Apr 5 2022, 06:56:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> x = np.random.randint(0, 100000, size=(3, 3, 10, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=497664000, size*itemsize=1990.656 GB
>>> x = np.random.randint(0, 100000, size=(3, 3, 15, 2160, 2560)).astype(np.float32)
>>> print(f"dype={x.dtype}, size={x.size}, size*itemsize={x.size*x.itemsize/1000000} GB")
dype=float32, size=746496000, size*itemsize=2985.984 GB
---- EDIT: more info
While running a single-well 9x8 case, I'm observing a moderate use of GPU memory, and an intensive use of CPU memory. This is a bit confusing, and requires more thoughts.
CPU memory:
GPU memory:
@tcompa Are you processing a single original FOV (i.e. a single 2160x2560 chunk) and get those results? That is indeed interesting. It's not too worrying if cellpose also needs CPU memory though. I think it does quite a bit of computation to process different network outputs like flow maps & probability maps into label images. If that requires memory on the CPU-side, so be it :)
Plus, the memory usage may be quite cellpose specific. So as long as it runs, that should do the trick for the moment I'd think.
If we heavily rely on it afterwards in production, we can think about optimizing it further.
Quick update on labeling/parallelization/memory (probably best to discuss this tomorrow)
We can probably live with point 1, but point 2 doesn't look good. More work is needed.
We are testing a single well with 9x8 sites in a single-FOV scheme. [Note: the single-FOV approach is the only one that we are currently supporting, as of recent decisions, and things won't change unless we explicitly change our mind (that is, we are not running anything multi-FOV).]
For labeling, at the moment we go through the 72 sites sequentially, i.e. there are never two cellpose calculations running at the same time.
Early experiments with more flexible approaches (letting dask organize the 72 parallel executions, through map_blocks
or similar, including https://github.com/dask/dask/issues/2768) led to memory errors within cellpose (for instance in the line mentioned above), and our obvious guess is that this was due to simultaneous execution of more than one cellpose calculation.
At the moment, we are not able to generate the array of labels of a whole well in a lazy way, and that's the reason for the large CPU-memory usage. For a well of 72 sites with 19 Z planes, this array takes around 30 GB (for uint32 labels). Which is still doable on our current RAM (64 G), but not really scalable. A quick mitigation is to switch back to uint16 labels, and only use uint32 when the number of labels becomes larger than 65k. But we would be just pushing the problem a bit further, not solving it.
If we dropped relabeling (which is not an option, in our understanding), we could try to construct this array lazily, by just:
This would still have two problems (1. concurrent execution of cellpose calculations can lead to memory errors, 2. lack of relabeling), but at least we would not be building the whole 9x8x19x2160x2560
array.
Great summary @tcompa, let's discuss in detail tomorrow. Some short notes:
If we dropped relabeling (which is not an option, in our understanding)
Dropping relabeling would be a pity, the relabeling makes this quite useful. But maybe it's still worth it to save non-relabeled label values first per chunk, potentially running on multiple nodes. And then have a collection task that changes the label images to relabeled? Would be IO overhead, but typically label images are fairly small on disk.
In that case, we can then either run this job on CPU nodes with high memory (much cheaper than blocking GPU nodes and we would only have high memory demand for a short time) or eventually figure out ways to do lazy relabeling, e.g. based on the logging information about counts of objects per fov.
Let's discuss tomorrow whether that's worth the complexity.
Some thoughts as of yesterday's Fractal meeting:
- Missing: we still need to limit the number of simultaneous Cellpose calls on the same GPU, but how to do it from dask is currently unclear.
Probably useful: https://docs.dask.org/en/stable/scheduler-overview.html#configuring-the-schedulers
Here's a minimal working example on a CPU:
mport time
import numpy as np
import dask
import dask.array as da
from concurrent.futures import ThreadPoolExecutor
n_threads = 4
def heavy_cellpose_function(block, block_info=None):
print(f"Now running heavy cellpose function for chunk {block_info[None]['chunk-location']}")
time.sleep(10)
return np.ones_like(block, dtype=np.uint16) * np.random.randint(10)
x = da.zeros((8, 12), chunks=(4, 3), dtype=np.uint16)
# Extract total number of chunks
n_chunks = (x.shape[0] // x.chunksize[0]) * (x.shape[1] // x.chunksize[1])
print(f"Number of chunks: {n_chunks}")
print(f"Number of threads: {n_threads}")
print()
t0 = time.perf_counter()
with dask.config.set(pool=ThreadPoolExecutor(n_threads)):
y = x.map_blocks(heavy_cellpose_function, chunks=(4, 3), meta=np.array((), dtype=np.uint16))
y.to_zarr("data.zarr")
t1 = time.perf_counter()
print()
print(f"Elapsed time: {t1 - t0}")
print()
with output:
Number of chunks: 8
Number of threads: 4
Now running heavy cellpose function for chunk (0, 0)
Now running heavy cellpose function for chunk (0, 1)
Now running heavy cellpose function for chunk (0, 2)
Now running heavy cellpose function for chunk (0, 3)
Now running heavy cellpose function for chunk (1, 0)
Now running heavy cellpose function for chunk (1, 1)
Now running heavy cellpose function for chunk (1, 2)
Now running heavy cellpose function for chunk (1, 3)
Elapsed time: 20.044721081001626
Now we should check whether it also works on the GPU.
Still on "running cellpose for several sites at the same time on the GPU":
We consider sites of shape (10,2160,2560)
, and we limit the number of concurrent cellpose executions as in https://github.com/fractal-analytics-platform/fractal/issues/64#issuecomment-1177229164.
We run the test on a 2x2 well, but we only count the sites which are actually run in parallel (i.e. if we have 4=3+1 sites and run them in batches of 3, we only look at the timing for the first three and we ignore the last one). This is the only relevant measure when scaling to large wells.
batch_size | time_per_site | time_per_site / batch_size | Speed-up factor |
---|---|---|---|
1 | ~4 min | ~4 min | 1 |
2 | ~5 min | ~2.5 min | ~1.6 |
3 | ~6 min | ~2 min | ~2 |
4 | MemoryError | - | - |
This suggests that:
Hi @gusqgm and @jluethi, let's discuss here the details of the image labeling task.