API discussion: provide channels separately rather than using number key

jni commented 3 years ago

Working through the docs, I found the API for working with channels confusing:

https://cellpose.readthedocs.io/en/latest/settings.html#channels

The cytoplasm model in cellpose is trained on two-channel images, where the first channel is the channel to segment, and the second channel is an optional nuclear channel. Here are the options for each: 1. 0=grayscale, 1=red, 2=green, 3=blue 2. 0=None (will set to zero), 1=red, 2=green, 3=blue

Set channels to a list with each of these elements, e.g. channels = [0,0] if you want to segment cells in grayscale or for single channel images, or channels = [2,3] if you green cells with blue nuclei.

My issues with this:

it's weird to have to specify channels=[0, 0] for an image with no channels.
it's weird to have to specify channels=[2, 3] when the channel indices in the array are actually [1, 2].
it's weird to have things labeled as "RGB" when you might only have a 2-channel image. It's unclear to me from the docs whether 2-channel images are even supported.

I have two alternate proposals:

(1) Follow the napari/now-skimage convention of channel_axis=None for a grayscale (no channels) image, channel_axis=-1 for an image having the channels in the last axis, channel_axis=0 for an image having the channels in the 1st axis (might match acquisition), etc. Then, in the case of channel_axis is not None, specify cytoplasm_channel=<int> and nucleus_channel=<int>.
(2) Don't allow multichannel input at all. Instead, input images should be specified as cytoplasm= and nucleus= input arrays. Users can specify one or the other or both.

Both of these methods can be made backwards compatible, which is nice. =)

Nelson-Gon commented 2 years ago

Following up on this. I found that setting channel_axis and channels at the same time messes up the segmentation. My understanding was that the channel_axis was to specify if the images are channel first or channel last so was a bit surprised at the difference in results.

ElpadoCan commented 1 year ago

I fully agree on this and since I found this issue through a Google search I thought I might ask here.

I don't understand how to use the channels. Images are not colored per se. If I have a 2D image (suppose 512x512, DAPI staining) of the nuclei and another 2D image for the cytoplasm staining they are both gray scale images. What does it mean that they are red or green? Do I need to construct an array with shape (3,512,512) or (512,512,3)? And what do I put on the non used channel? Can I build a (2,512,512) array and pass channel=[0, 1] even though the cytoplasm staining used a green staining?

Thank you very much!

VolkerH commented 1 year ago

Another +1 here. I just spent 2 hours with a collegue wondering about this and still not sure whether our code is correct. Also, I also have the question @ElpadoCan

Can I build a (2,512,512) array and pass channel=[0, 1] even though the cytoplasm staining used a green staining?

I think implementing the suggestions by @jni would be great, but at this stage would break many applications that build on the existing API. However, it would be nice to clarify the documentation section here https://cellpose.readthedocs.io/en/latest/settings.html#channels Most microscopists will not use RGB images but have images for indiividual channels. It would be great if the documentation could explain that use case. I'd be happy to help but I admittedly don't understand the current approach.

jni commented 1 year ago

@VolkerH the suggestions can all be implemented in a backward-compatible way. The old API would still work, but if channel_axis= is provided then the array is interpreted in the new way. Or if cytoplasm_image= or nucleus_image= are not None, then you are using the new API. But the old API can keep working indefinitely.

gdreiman-insitro commented 1 year ago

I was also very confused about how channel indexing worked.

TL;DR: You need to consider your image channels to be one indexed, and if you pass 0 cellpose will average all of the channels of your image.

Here's why I think that happens: In models.CellposeModel when you call the .eval method, the channels argument is passed to cellpose.transforms.convert_image at line 543. Within convert_image there's a codeblock here:

# use grayscale image
    if data.shape[-1]==1:
        data = np.concatenate((data, np.zeros_like(data)), axis=-1)
    else:
        if channels[0]==0:
            data = data.mean(axis=-1, keepdims=True)
            data = np.concatenate((data, np.zeros_like(data)), axis=-1)
        else:
            chanid = [channels[0]-1]
            if channels[1] > 0:
                chanid.append(channels[1]-1)
            data = data[...,chanid]
            for i in range(data.shape[-1]):
                if np.ptp(data[...,i]) == 0.0:
                    if i==0:
                        warnings.warn("chan to seg' has value range of ZERO")
                    else:
                        warnings.warn("'chan2 (opt)' has value range of ZERO, can instead set chan2 to 0")
            if data.shape[-1]==1:
                data = np.concatenate((data, np.zeros_like(data)), axis=-1)

So I think passing channels = [0,0] into model.eval causes the channels of your image to averaged by cellpose.transforms.convert_image :

        if channels[0]==0:
            data = data.mean(axis=-1, keepdims=True)
            data = np.concatenate((data, np.zeros_like(data)), axis=-1)

If you pass values greater than 0, i.e. channels = [1,2], the code just subtracts one from the channel values so that they become zero indexed and can be used properly to index into the image you provided:

else:
            chanid = [channels[0]-1]
            if channels[1] > 0:
                chanid.append(channels[1]-1)

@carsen-stringer maybe a small clarification in the docs would resolve a lot of this confusion.

jaspreetishar commented 1 month ago

Hello Team,

First of all, thank you for the Cellpose tool and the documentation!

I’d like to revisit this API discussion thread as I still have some queries about using different channel combinations and the results they produce. For context, in my preprocessing script for creating a multi-channel stacked image, I pass two single-channel images: the first is a DAPI stain, which is assigned to the 0th index in the stacked image, and the second is a PolyT stain, assigned to the 1st index.

Previously, I passed the combination [1,0] to Cellpose because I wanted to segment the PolyT channel and use the DAPI channel as the optional one. However, I tried using [2,1] based on the documentation and this Colab Notebook, thinking that perhaps Cellpose treats the zeroth channel as the first (starting its channel count from 1 rather than the usual 0) and assigns RGB in that order—assigning red to the zeroth channel, green to the first, and so on. The authors of the notebook used [2,1] for their two-channel image, which led me to explore this possibility for a two-channel setup.

The results between the two combinations are quite different, and I’d appreciate some clarification on how the channel array should be used.

Thank you in advance for your help!

carsen-stringer commented 1 month ago

The documentation on channel choosing is available here: https://cellpose.readthedocs.io/en/latest/settings.html#channels

0 in the first channel input means grayscale, and 0 in the second channel input means to use zeros for the nuclear channel

Teranis commented 1 month ago

Hello Team,

I also found the API for cellpose.train.train_seg very confusing, since for channels it defaults to None. This means that most people will ignore it if they think they do not need it (like in my case, where I work with greyscale images). I would highly recommend linking the guide you mentioned in the training tutorial, changing the default from None to [0,0] and maybe thinking about catching errors with the configuration of the channels at the begging of the whole process, so that users don't start wondering why resize is expecting 3D data when they only provide 2D data.

Thank you for all the great work you are doing and for any help in advance!

MouseLand / cellpose

API discussion: provide channels separately rather than using number key #275