Describe the bug
Cell mask generation (in both cell clustering and neighborhood notebooks) assumes that all FOVs have the same maximum cluster number, which can generate cell masks that are wrong.
This is the code in question (in data_utils.py):
def relabel_segmentation(labeled_image, labels_dict):
"""Takes a labeled image and translates its labels according to a dictionary.
Returns the relabeled array (according to the dictionary).
Args:
labeled_image (numpy.ndarray):
2D numpy array of labeled cell objects.
labels_dict (dict):
a mapping between labeled cells and their clusters.
Returns:
numpy.ndarray:
The relabeled array.
"""
img = np.copy(labeled_image)
unique_cell_ids = np.unique(labeled_image)
unique_cell_ids = unique_cell_ids[np.nonzero(unique_cell_ids)]
default_label = max(labels_dict.values()) + 1
# cast to int16 to allow for Photoshop loading
relabeled_img = np.vectorize(
lambda x: labels_dict.get(x, default_label) if x != 0 else 0
)(img).astype('int16')
return relabeled_img
This function is called in the label_cells_by_cluster function, which is run for each FOV. The issue is that this line:
default_label = max(labels_dict.values()) + 1
assumes that the labels_dict (which is generated for each FOV independently) has the same maximum value for all FOVs. This is not always true. For example, let's say Tcells is cluster number 19 and Bcells is cluster number 20. Let's say fov1 has both Tcells and Bcells (so it has cluster number 19 and 20), but fov2 has T cells but not B cells (so it has cluster number 19 but no 20). For fov2, the default_label would be set to 20 (because the maximum for this fov is 19), so some cells in the image may be assigned to label 20 even though there are no Bcells in the image.
I believe default_label was set to max(labels_dict.values()) + 1 because in my original code for mask generation, I assigned all cells that weren't assigned a cluster label (because that cell wasn't included in clustering for whatever reason) to an "unassigned" category (and the unassigned category was assigned a cluster number 1 greater than the actual number of clusters).
Not directly related to this issue but also in the function, I'm not sure why these lines are included (unique_cell_ids doesn't seem to be used in the function):
Calculate the default_label using the entire dataset, then feed that value into label_cells_by_cluster and relabel_segmentation. So each FOV has the same default_label. Calculating default_label could be done in generate_cell_cluster_mask after reading in the cell data.
Remove unique_cell_ids from the function (unless there is a reason for this I'm not seeing)
I think it would be helpful to be able to display the default_label cells as an "Unassigned" cluster during visualization. Right now, in assign_metacluster_cmap, there are the following lines:
This is setting all numbers that are NOT in the list of clusters to 0. Therefore, the cells with default_label are being set to 0. I think it would be helpful to keep those as a separate cluster and display it as such in the visualization. We will need to manually "add" the "Unassigned" label to the list of metaclusters. I think it'd be good to fix the "Unassigned" category to some gray color (like #5A5A5A). We will also need to make sure that the Mantis masks and csv's are generated such that this "Unassigned" category is included.
Describe the bug Cell mask generation (in both cell clustering and neighborhood notebooks) assumes that all FOVs have the same maximum cluster number, which can generate cell masks that are wrong.
This is the code in question (in
data_utils.py
):This function is called in the
label_cells_by_cluster
function, which is run for each FOV. The issue is that this line:default_label = max(labels_dict.values()) + 1
assumes that the
labels_dict
(which is generated for each FOV independently) has the same maximum value for all FOVs. This is not always true. For example, let's say Tcells is cluster number 19 and Bcells is cluster number 20. Let's say fov1 has both Tcells and Bcells (so it has cluster number 19 and 20), but fov2 has T cells but not B cells (so it has cluster number 19 but no 20). For fov2, thedefault_label
would be set to 20 (because the maximum for this fov is 19), so some cells in the image may be assigned to label 20 even though there are no Bcells in the image.I believe
default_label
was set tomax(labels_dict.values()) + 1
because in my original code for mask generation, I assigned all cells that weren't assigned a cluster label (because that cell wasn't included in clustering for whatever reason) to an "unassigned" category (and the unassigned category was assigned a cluster number 1 greater than the actual number of clusters).Not directly related to this issue but also in the function, I'm not sure why these lines are included (
unique_cell_ids
doesn't seem to be used in the function):Expected behavior
default_label
using the entire dataset, then feed that value intolabel_cells_by_cluster
andrelabel_segmentation
. So each FOV has the samedefault_label
. Calculatingdefault_label
could be done ingenerate_cell_cluster_mask
after reading in the cell data.unique_cell_ids
from the function (unless there is a reason for this I'm not seeing)I think it would be helpful to be able to display the
default_label
cells as an "Unassigned" cluster during visualization. Right now, inassign_metacluster_cmap
, there are the following lines:This is setting all numbers that are NOT in the list of clusters to 0. Therefore, the cells with
default_label
are being set to 0. I think it would be helpful to keep those as a separate cluster and display it as such in the visualization. We will need to manually "add" the "Unassigned" label to the list of metaclusters. I think it'd be good to fix the "Unassigned" category to some gray color (like #5A5A5A). We will also need to make sure that the Mantis masks and csv's are generated such that this "Unassigned" category is included.