angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
71 stars 25 forks source link

Generic Segmentation Masks Notebook and Continuous Cmaps #1037

Closed srivarra closed 12 months ago

srivarra commented 1 year ago

This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue

Instructions

This document should be filled out prior to embarking on any project that will take more than a couple of hours to complete. The goal is to make sure that everyone is on the same page for the functionality and requirements of new features. Therefore, it's important that this is detailed enough to catch any misunderstandings beforehand. For larger projects, it can be useful to first give a high-level sketch, and then go back and fill in the details. For smaller ones, filling the entire thing out at once can be sufficient.

Relevant background

With #1030 there are now multiple methods of creating, visualizing and saving segmentation masks. It'll be beneficial for a notebook to contain the available methods.

  1. Color a pixel / cell segmentation mask by cell type from Pixie, or any clustering method.
  2. Plot with and without the color bar and save as a TIFF or PNG.

In addition, some objects and their associated characteristics are not categorical, so we should be able to supply a continuous colormap for those metrics.

Design overview

Sequential Colormaps

Plotting behavior will be similar to create_cmap and plot_cluster in #1030, however it will not use colors.ListedColormap as that is primarily for discrete colormaps. Instead, the user can call any sequential colormap from Matplotlib, or create their own and use that.

Optional Erosion Step

For segmentation mask boundaries, add an optional step for creating some empty space around each cell to make visualizations easier to understand.

Notebook

The notebook will consist of the following components:

  1. Load in segmentation files / masks and cell table
    1. Example dataset or user data.
  2. Cluster and color the clusters with Pixie
    1. Convert it to the cell type label
    2. Color the clusters if the colors are provided, otherwise use a default colormap
      1. Add a step to construct a colormap if the user would like to.
    3. Plot the image with the color bar / cluster labels
    4. Save the image:
      1. PNG with color bar
      2. PNG without color bar
      3. colorized TIFF - 3 channels, RGB
  3. Continuous Colormaps
    1. Add a step to create a custom continuous colormap

Code mockup

Sequential Colormaps

def plot_continuous(
        image: np.ndarray,
        fov: str,
        cmap: colors.LinearSegmentedColormap,
        norm: colors.Normalize | colors.LogNorm, colors.LinearSegmentedColormap | colors.AsinhNorm | colors.PowerNorm ... | None = None
        cbar_visible: bool = True,
        dpi: int = 300,
        figsize: tuple[int, int] = (10, 10)
) -> Figure:
    # Default colorbar labels

    if norm is None:
        norm = colors.Normalize(vmin=, vmax=1)
    else:
        assert isinstance(norm, colors.Normalize)

    fig: Figure = plt.figure(figsize=figsize, dpi=dpi)
    fig.set_layout_engine(layout="tight")

    if cbar_visible:
        gs = gridspec.GridSpec(nrows=1, ncols=2, figure=fig, width_ratios=[60, 1])
        # Colorbar Axis
        cax: Axes = fig.add_subplot(gs[0, 1])
        # Manually set the colorbar
        cbar = fig.colorbar(cm.ScalarMappable(norm=norm, cmap=cmap), cax=cax,
                            orientation="vertical", use_gridspec=True, pad=0.1)
        cbar.ax.set_yticks(
            ticks=np.arange(len(cbar_labels)),
            labels=cbar_labels
        )
        cbar.minorticks_off()
    else:
        gs = gridspec.GridSpec(nrows=1, ncols=1, figure=fig)

    fig.suptitle(f"{fov}")

    # Image axis
    ax: Axes = fig.add_subplot(gs[0, 0])
    ax.axis("off")
    ax.grid(visible=False)

    ax.imshow(
        X=image,
        cmap=cmap,
        norm=norm,
        origin="upper",
        aspect="equal",
        interpolation=None,
    )

    return fig

Optional Erosion Step

# In the cluster plot function
def cluster_plot(seg_mask..., erode=False):
    ...

    # Erode the mask
    if erode:
        edges = find_boundaries(seg_mask, mode='inner')
        seg_mask = np.where(edges == 0, seg_mask, 0)
    ...
    return fig

Notebook

# Import dependencies
...

# Load in the data
example_dataset(...)
# Pixie masking
# Load in the segmentation to cluster file
seg_to_cluster = pd.read_csv(...)
# Load in the segmentation mask
seg_mask = load_imgs_from_dir(...)
metacluster_colors = load_raw_cmap(...)
cell_cluster_mask = load_utils.load_imgs_from_dir(
  data_dir = os.path.join(base_dir, "pixie", cell_output_dir, "cell_masks"),
  files=[cell_fov + "_cell_mask.tiff"],
  trim_suffix="_cell_mask",
  match_substring="_cell_mask",
  xr_dim_name="cell_mask",
  xr_channel_names=None,
)
colors_array = np.array(["blue", "green", "orange", ...])
cluster_cmap, cluster_norm = create_cmap(colors_array)

fig = plot_pixel_cell_cluster_overlay(
    seg_mask
    fov=fov_name,
    metacluster_colors = metacluster_colors,
    cmap = cluster_cmap
    norm = cluster_norm,
    ...
)

fig.show()
# Save the image with the colorbar
fig.save_fig(...)

# Save the image without the colorbar, while reading in the assigned metacluster colors.

fig = plot_pixel_cell_cluster_overlay(
    seg_mask
    fov=fov_name,
    metacluster_colors = metacluster_colors,
    cmap = cluster_cmap
    norm = cluster_norm,
    cbar = False
    ...
)
fig.save_fig(...)
# Save the image as a colorized TIFF
save_colored_masks(
    seg_mask,
    cell_cluster_mask,
    metacluster_colors,
    fov_name,
    ...
)

# Load a fiber image

plot_continuous(
    image = fiber_seg_mask,
    fov = fov_name,
    cmap = "viridis",
    norm = colors.LogNorm(vmin=1, vmax=1000),
    cbar_visible = True,
    ...
)
# Save the fiber image as a colarized TIFF
save_colored_masks(
    fiber_seg_mask,
    cell_cluster_mask,
    metacluster_colors,
    fov_name,
    ...
)

Required inputs

Segmentation masks from Notebook 1, and then clustering results from Notebook 2. Specifically the raw-cmap and the mapping of the clusters to the metaclusters, This may have to be a saved value, perhaps a JSON file or something of the like.

Plotting a cohort's segmentation masks requires a csv file containing at least two columns, the first being the original cluster value, and the second being the remapped value.

For example:

pixel_som_cluster,pixel_meta_cluster,pixel_meta_cluster_rename
1,9,9
2,9,9
3,5,5
4,5,5
5,5,5
6,5,5
7,20,20

which is the pixel_meta_cluster_mapping.csv file from Notebook 2 in the Pixie pipeline

or

cell_som_cluster,cell_meta_cluster,cell_meta_cluster_rename
1,5,5
2,5,5
3,5,5
4,4,4
5,4,4
6,4,4

which is the cell_meta_cluster_mapping.csv file from Notebook 3 in the Pixie pipeline.

We can generalize this to accept any csv file consisting of at least two columns, as long as there exists an original cluster value column and a remapped cluster value column. This is to account for clusters which may not exist in a particular FOV, but do in another.

Output files

For example, we should get similar outputs to the following in #1030 with fov0:

cell_clustering_fov1

In addition, we should get a colorized TIFF with the segmentation mask and the cell cluster mask, all saved in their respective directories pixie/cell_output_dir/pixel_mask_colored, or to any user specified directory.

Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

Estimated date when a fully implemented version will be ready for review: 8/17

Estimated date when the finalized project will be merged in: 8/22

srivarra commented 1 year ago

@ngreenwald

ngreenwald commented 1 year ago

Overall looks good. The point of this notebook is to showcase to people how they can take a generic set of labels, and a generic segmentation mask, and generate an overlay. So using the pixie labels mesmer segmentation as an example is fine, but it needs to be clear how people can take any random CSV that they've generated, and any random mask, and get the same output.

i.e., what columns does the df need to have? Or a step to make a compatible df out of the one they loaded.

srivarra commented 1 year ago

@ngreenwald Added a section in Required Inputs which describes the columns of the DataFrame / csv necessary.

ngreenwald commented 1 year ago

Here's what the notebook needs to be able to.

  1. Take a segmentation mask with labels [1, 2, 3, ...N].
  2. Take a .csv where one column is segmentation IDs (same above), and a second column is some categorical variable of interest.
  3. Generate an overlay which each cell is colored by the categorical column.

The remapping from cluster to meta cluster is not the crucial part, that's specific to the pixie workflow. What matter is that given a segmentation mask and a generic list of categorical assignments, those can be visualized.