Marker Intensity vs. Image Location Analysis Script

srivarra commented 1 year ago

This is for internal use only; if you'd like to open an issue or request a new feature, please open a bug or enhancement issue

Instructions

This document should be filled out prior to embarking on any project that will take more than a couple hours to complete. The goal is to make sure that everyone is on the same page for the functionality and requirements of new features. Therefore, it's important that this is detailed enough to catch any misunderstandings beforehand. For larger projects, it can be useful to first give a high-level sketch, and then go back and fill in the details. For smaller ones, filling the entire thing out at once can be sufficient.

Relevant background

The location of the image on a slide is a source of variation for image acquisition. We need a reproducible way to analyze if and how image location correlates with signal intensity. Refer to angelolab/ark-analysis#277.

Design overview

The notebook will be after Notebook 5 - Rename and Reorganize.

Use the existing QC Metrics to analyze signal intensity on a per-channel, per-image basis for each Row $N$, Column $M$:
- The qc_metric should be able to use "Total Intensity", "99.9% intensity value", "Non-zero Mean Intensity".
- Take note that not all cores in a TMA are utilized, go by Row Column indcators in the fov filename. (i.e fov-15-R6C4.tiff)
Use the violin plots in the Batch Effects notebook to visualize the QC metrics
Use a Heatmap, where the $x$ axis is the Row, $y$ axis is the Column with support for all variants of Signal Intensity (like above + supporting all the same functions in qc_comp.py.)
It might be reasonable to shift some QC metrics to take advantage of dataclasses, but let's see if it's necessary first.

Code mockup

Provide a more detailed description of how the project will implement the above functionality. This should describe the logical flow of the program, detailing how the different parts interact with each other. It should list the specific helper functions that will be created, either with a description of what they'll do, or with psuedo-code for how they'll be implemented.

Add the following functions to qc_comp.py:

def compute_rc_qc_metrics(reorg_bin_file_paths: Union[str, os.PathLike],
                          reorg_extractd_imgs_path: Union[str, os.PathLike],
                          qc_metrics_path: Union[str, os.PathLike],
                          fov_names: List[str]):
    """Compute the QC metrics for a given set of bin files w.r.t to the Row number and Column number, post reorganization."""

    # Path validation
    ...
    # load the bin file(s)
    ...

    # Collect the Rows, Columns, fovs from the input fields
    Rs, Cs = row_column_from_fov_names(fov_names) # can just be a simple anonymous function

    # Check to see if the qc_metrics already exist for all the given fovs
    validate_data(qc_metrics_path)
    fov_missing_qc_metrics = set(fov_names) - set(get_fovs(qc_metrics_path))

    # If there any missing FOVs, compute the metrics for those FOVs

    for fov in fov_missing_qc_metrics:
        # load the corresponding bin
        ...
        # compute the metrics

        qc_comp.compute_qc_metrics(reorg_bin_file_path, reorg_extracted_imgs_path, fov)

        # Reorganize QC data and add the Row, Col information to the data.

Add the following functions to qc_metrics_plots.py:

def heatmap_rc_qc_metrics(plotting_dfs: Union[pd.DataFrame, os.PathLike],
                          fovs: List[str],
                          metric: str,
                          ):

    # Input Validation for plotting_dfs, metric, etc...
    if isinstance(plotting_dfs, os.PathLike):
        plotting_dfs = pd.read_csv(plotting_dfs)

    # Plot a single metric for each FOV, which plots the Row Column heatmap
    for channel, in plotting_dfs, fovs:
        sns.heatmap(channel[metric], annot=False, fmt=".2f", cmap="some_cmap")

def violin_rc_qc_metrics(plotting_dfs: Union[pd.DataFrame, os.PathLike],
                         fovs: List[str],
                         metric: str,
                         ):
    # Input Validation
    # Get rows, columns from fovs
    Rs, Cs = row_column_from_fov_names(fovs) # can be a lambda or from the dataframe
    # Call the `qc_metrics.call_violin_swarm_plot` function to plot the metric for each FOV per Row Column.

Required inputs

The inputs are the bin files, tiff files generated from the bin files (contain the Row, Column info), existing qc metric files, and paths to the reorganized files.

Output files

Outputs should be contained within the Jupyter notebook, plot both heatmaps, and violin plots, of how the Row #, Column # effect any one of the QC metric variations.

Timeline Give a rough estimate for how long you think the project will take. In general, it's better to be too conservative rather than too optimistic.

A week

Estimated date when a fully implemented version will be ready for review: Friday, Dec 16, 2022.

Estimated date when the finalized project will be merged in: Monday, Dec 19, 2022.

srivarra commented 1 year ago

@ngreenwald

ngreenwald commented 1 year ago

This looks good. Only thing about using the previously generated QC metrics: those metrics were generated using the original FOV names. So they would be FOV1, FOV2, etc. The renamed FOVs will be R1C5, R5C2, etc, without the original name. You can use the logic in notebook 5 to reverse engineer the naming scheme and find the original name, and hence the original QC metric, but given that this computation doesn't take very long, it may be okay to just regenerate those metric calculations with the new name.

Maybe for a first pass assume there isn't already generated QC data, and if computation ends up being slow, add the option to find the original name? Up to.

Good to proceed!

srivarra commented 1 year ago

@ngreenwald

Good catch, for now I'll just assume that none of the QC metrics are pre-calculated, and I'll add that feature in the second pass.

angelolab / toffy

Marker Intensity vs. Image Location Analysis Script #304

Instructions