angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
73 stars 26 forks source link

Cell mask generation assumes all FOVs have the same maximum cluster number #1010

Closed cliu72 closed 1 year ago

cliu72 commented 1 year ago

Describe the bug Cell mask generation (in both cell clustering and neighborhood notebooks) assumes that all FOVs have the same maximum cluster number, which can generate cell masks that are wrong.

This is the code in question (in data_utils.py):

def relabel_segmentation(labeled_image, labels_dict):
    """Takes a labeled image and translates its labels according to a dictionary.

    Returns the relabeled array (according to the dictionary).

    Args:
        labeled_image (numpy.ndarray):
            2D numpy array of labeled cell objects.
        labels_dict (dict):
            a mapping between labeled cells and their clusters.

    Returns:
        numpy.ndarray:
            The relabeled array.
    """

    img = np.copy(labeled_image)
    unique_cell_ids = np.unique(labeled_image)
    unique_cell_ids = unique_cell_ids[np.nonzero(unique_cell_ids)]

    default_label = max(labels_dict.values()) + 1

    # cast to int16 to allow for Photoshop loading
    relabeled_img = np.vectorize(
        lambda x: labels_dict.get(x, default_label) if x != 0 else 0
    )(img).astype('int16')

    return relabeled_img

This function is called in the label_cells_by_cluster function, which is run for each FOV. The issue is that this line:

default_label = max(labels_dict.values()) + 1

assumes that the labels_dict (which is generated for each FOV independently) has the same maximum value for all FOVs. This is not always true. For example, let's say Tcells is cluster number 19 and Bcells is cluster number 20. Let's say fov1 has both Tcells and Bcells (so it has cluster number 19 and 20), but fov2 has T cells but not B cells (so it has cluster number 19 but no 20). For fov2, the default_label would be set to 20 (because the maximum for this fov is 19), so some cells in the image may be assigned to label 20 even though there are no Bcells in the image.

I believe default_label was set to max(labels_dict.values()) + 1 because in my original code for mask generation, I assigned all cells that weren't assigned a cluster label (because that cell wasn't included in clustering for whatever reason) to an "unassigned" category (and the unassigned category was assigned a cluster number 1 greater than the actual number of clusters).

Not directly related to this issue but also in the function, I'm not sure why these lines are included (unique_cell_ids doesn't seem to be used in the function):

unique_cell_ids = np.unique(labeled_image)
unique_cell_ids = unique_cell_ids[np.nonzero(unique_cell_ids)]

Expected behavior

ngreenwald commented 1 year ago

I think this can be wrapped up with the changes in #857. Might need two separate PRs, but good to coordinate in either case.