KHChenLab / FISHnCHIPs

Software for FISHnCHIPs. See https://www.nature.com/articles/s41467-024-46669-y
Other
2 stars 1 forks source link

FISHnCHIPs: Fluorescence In Situ Hybridization of Cellular HeterogeneIty and Gene Expression Programs

About

We present FISHnCHIPs for highly sensitive in situ profiling of cell types and gene expression programs. FISHnCHIPs achieves this by simultaneously imaging ∼2-35 co-expressed genes that are spatially co-localized in tissues, resulting in similar spatial information as single-gene FISH, but at ∼2-20-fold higher sensitivity. See https://www.nature.com/articles/s41467-024-46669-y. This software guides users to design and evaluate their own gene panel using input from scRNA-seq. We also provide the FISHnCHIPs data analysis pipeline that we used to process the raw FISHnCHIPs data and cluster the module-cell matrix to define cell types.

Workflow of panel design

Contents

Prerequisites

System requirements:

A computer that can run Python and/or R, with at least 16 GB of RAM. No non-standard hardware is required.

Software requirements:

For Gene Panel Design:

For Image Processing:

Tested dependencies

Packages common to both gene panel design and data processing:

For Gene Panel Design:

For Image Processing:

For manuscript figures (R packages):

Getting started

Installation

  1. Download and install Anaconda.
  2. Spyder and the common packages will be installed with Anaconda. Installation of Anaconda typically takes less than 30 minutes.
  3. For image processing, install cellpose and scikit-image using either pip or conda install: Cellpose
    • pip install cellpose
    • conda install -c conda-forge cellpose scikit-image
    • pip install scikit-image
    • conda install -c anaconda scikit-image
  4. For gene panel design, install igraph and leidenalg using either pip or conda install: igraph
    • pip install igraph
    • conda install -c conda-forge python-igraph leidenalg
    • pip install leidenalg
    • conda install -c conda-forge leidenalg
  5. Code has been tested to run on Spyder and Jupyter Notebook but also can also be run from any other Python 3 compatible IDE.

Gene Panel Design Demo

The tutorial for the workflow of gene panel design and evaluation are provided as a jupyter notebook file Gene panel design tutorial.ipynb in the FISHnCHIPs_GenePanelDesign_Tutorial folder. In the tutorial, various functions were called from the FISHnCHIPS_0.1.0.py package in the scripts folder.

Input files provided (csv format):

Functions:

Generation of panel design

  1. get_panel (used for both cell-centric and gene-centric panel design)

    • Inputs:
      • correlation matrix
      • reference markers
    • Hyperparameters:
      • correlation threshold
      • minimum number of genes required from each marker

        Function returns a DataFrame containing the genes selected for the panel based on the hyperparameters, the correlation of each gene with the reference marker gene and the cluster that it belongs to.

  2. get_filtered_genes (only for gene-centric panel design)

    • Inputs:
      • Gene-cell expression matrix
    • Hyperparameters:
      • Minimum gene expression level
      • Minimum number of cells that gene is present in
      • Maximum number of cells that gene is present in
      • Name pattern of genes to exclude from panel

        Function for prefiltering the genes with low gene expression level, specific naming patterns, minimum and maximum number of cells that gene is present in to ensure that they are adequately expressed genes.

        Returns the list of genes that passes the filtering requirements.

  3. leiden_corr_matrix (only for gene-centric panel design)

    • Inputs:
      • Post-filtered correlation matrix
    • Hyperparameters:
      • Edge correlation threshold
      • Partition type (Default: ModularityVertexPartition)

        As the reference marker file is not available for gene-centric panel design, the clustering of genes will have to be conducted using algorithms based on the gene-gene correlation.

        Function uses the leiden algorithm to cluster the genes and removes genes that have correlation lower than the threshold with all other genes, returning a tuple containing a dataframe of the cluster that each selected gene belongs to and the cluster network graph of the selected genes.

Panel Evaluation

  1. get_cumulative_signals (only for cell-centric panel evaluation)

    • Inputs:
      • Gene-cell expression matrix
      • Panel design information
      • scRNA celltype
    • Hyperparameters:
      • Usage of multiple probes

        Function calculates the signal gain, conserved signal gain and Signal Specificity Ratio (SSR) of each gene using the gene-cell expression matrix with genes selected during the panel design and returning the results as a DataFrame.

  2. get_cell_bit_matrix (only for gene-centric panel evaluation)

    • Inputs:
      • Gene-cell expression matrix
      • Panel design information
    • Hyperparameters:
      • Name of column indicating gene cluster
      • Usage of multiple probes

        Function returns a DataFrame that contains the expression level of each cluster of genes in each cell (cell-bit matrix).

  3. evaluate_gene_centric_panel (only for gene-centric panel evaluation)

    • Inputs:
      • Gene-cell expression matrix
      • Panel design information
      • Cell-bit matrix
    • Hyperparameters:
      • Signal threshold

        Function returns a DataFrame containing the signal gain for each gene and the cumulative signal gain for genes in the same cluster. The signal threshold hyperparameter was used for differentiating signal cells from background noise.

Data Analysis Demo

The tutorial for the analysis of FISHnCHIPs image data is provided as a jupyter notebook file FISHnCHIPs data analysis tutorial.ipynb in the FISHnCHIPs_DataAnalysis_Tutorial folder. In the tutorial, various functions were called from the FISHnCHIPS_DataAnalysis.py package (To be packaged, containing FISHnCHIPsImages.py, registerFunction.py and segmentationFunction.py) in the scripts folder.

Input files provided:

Functions:

Generation of cell mask images FISHnCHIPsImages.py

  1. segment_dapi_one_fov

    • Inputs:
      • DAPI Image (TIF format)
    • Hyperparameters:
      • Parameters from yaml file
      • Output path
      • List of DAPI FOVs to segment

        Function saves a DAPI image with overlapping border removed (TIF format), dilated cell masks image (TIF format) and an overlay of cell masks onto the DAPI image (JPG format) in the output path folder.

  2. subtract_one_image

    • Inputs:
      • Hyb Images (TIF format)
      • Bleached Hyb Images (TIF format)

    Function returns the bleach-subtracted Hyb images in TIF format.

  3. segment_one_fov

    • Inputs:
      • DAPI Image (TIF format)
      • Hyb Images (TIF format)
      • Bleached Hyb Images (TIF format)

    An all-in-one function that performs the functions of segment_dapi_one_fov and subtract_one_image when all DAPI, Hyb and bleached Hyb images are ready. It also returns list of all cell mask for spatial and mask intensity analysis.

Calculate cell positions & brightness intensity segmentationFunctions.py

  1. get_centroids

    • Inputs:
      • Dilated cell mask image (TIF format)
      • DAPI Image after border removal (TIF format)
    • Hyperparameters:
      • Output path
      • Prefix of dilated cell mask image
      • Prefix of DAPI image after border removal
      • List of FOVs to segment

        Function returns the coordinates of the cell masks of each FOV as a DataFrame.

  2. get_mask_positions_inFuse

    • Inputs:
      • List of FOVs that cell masks belongs to
      • List of x-coordinates of corresponding cell masks
      • List of y-coordinates of corresponding cell masks
      • Number of FOVs along x-axis
      • Number of FOVs along y-axis

    Function returns the coordinates of the cell masks in the context where all FOVs are fused as a DataFrame.

  3. get_mask_intensity_matrix

    • Inputs:
      • List of all cell masks

    Function takes in the list of all cell masks from the segment_one_fov function as input and returns 4 separate DataFrames with the mean, median, maximum and summation of mask intensity of each cell for each Hyb.

  4. get_mask_info

    • Inputs:
      • List of all cell masks

    Function takes in the list of all cell masks from the segment_one_fov function as input and returns various information of the cell masks including list of FOVs analysed, cell types identified, mask intensity, area of masks and mask spatial positions as a DataFrame.

  5. get_mask_positions_inFOV

    • Inputs:
      • Cell mask information

    Function takes in the cell mask information from the get_mask_info function as input and returns a tuple of fov, x-coordinate and y-coordinate of each cell position in their respective FOV which can be fed into the get_mask_positions_inFuse function to obtain the spatial position of cell masks in the context where all FOVs are fused.

FISHnCHIPs manuscript figure

Figures 3 and 4 are produced using R data visualizaion packages while Figure 5 and 6 are produced with Python visualization tools. For Figure 5 and 6, the packages used are the same as the tutorial, hence simply run the jupyter notebook to reproduce the figures. For Figure 3 and 4, functions from the standalone package capFISHImage were used. Please install the package provided in the package folder using the following command and run the script provided accordingly:

install.packages('./package/capFISHImage_0.1.0.zip', repos=NULL, type='source')

Parameter settings

An explanation of each of the parameters used in Gene panel design tutorial.ipynb and FISHnCHIPs data analysis tutorial.ipynb.

Gene panel design tutorial * **min_corr**: Minimum correlation for genes to be considered as highly correlated * **min_ngenes**: Minimum number of genes to include in the panel design from each reference marker celltype * **min_expression**: Expression threshold of genes in cells to binarize gene expression level * **min_cells**: Minimum number of cells that the genes are expressed in * **max_cells**: Maximum number of cells that the genes are expressed in * **filt_name_pattern**: Naming pattern of mitochondrial or pseudo genes to remove
FISHnCHIPs data analysis tutorial
Parameters in yaml file: * **mainpath**: The file path where DAPI and hyb image TIF file are stored * **schema**: The file path where the schema csv file is located. Schema file should contain the cell type to analyse and its corresponding dye * **dapiname**: Prefix name of dapi file * **fovs**: Number of field of views (FOVs) to process * **fov_x**: Number of FOVs along x-axis * **fov_y**: Number of FOVs along y-axis * **overlap_in_px**: Number of pixel that overlaps between each FOV * **background**: Default "bleach"; Type of background image used to offset background noise * **remove_background**: Default 'subtract'; How background noise is removed * **show_dapiseg**: Boolean; Whether to run segmentation function * **segmentation**: Select segmentation method; 'cellpose' or 'watershed' * **segment_mode**: 'Cytoplasm', 'Cytoplasm2', 'Cytoplasm2_Omnipose', 'Nuclei' for cellpose * **cellsize**: Cell size in μm * **flow_threshold**: Default 0.4; Increase threshold if cellpose is not returning as many ROIs as you’d expect. * **cellprob_threshold**: Default 0.0; Decrease this threshold if cellpose is not returning as many ROIs as you’d expect. * **mask_dilate_factor**: Amount of dilation applied to cell mask * **filtermask**: Boolean; Whether to remove small masks * **filtermask_size**: Minimum mask size * **anchor_name**: Prefix of hyb image to segment e.g.'Cy7_3_' * **anchor_celltype**: Celltype label of the hyb image e.g.'CAF_1' *Configuration for watershed segmentation* * **fusedpath**: File path of fused image to determine the minimum and maximum intensity at a certain percentile * **cutoff_threshold**: Threshold value which is used to classify the pixel values into foreground and background classes, creating a binary image * **opening_threshold**: Number of iterations of erosion and dilation that the image goes through * **kernel_size**: Size of kernel that slides through the image which will determine how much the image is eroded or dilated
Parameters in tutorial: * **segmentDAPI**: Boolean input. True if segmentation step is desired. * **dapi_prefix**: DAPI filename prefix * **prefix**: Post-segmentation image file prefix * **yml_file**: yaml filename

Citation and code dependencies

The function provided in the tutorials uses the following packages that are included in the standard Anaconda distribution:

Python * [Pandas](https://pandas.pydata.org/) * [Numpy](https://numpy.org/) * [Matplotlib](https://matplotlib.org/) * [Numba](https://numba.pydata.org/) * [Seaborn](https://seaborn.pydata.org/) * [Scipy](https://scipy.org/) * [Scanpy](https://scanpy.readthedocs.io/en/stable/) * [igraph](https://igraph.org/) * [Leidenalg](https://github.com/vtraag/leidenalg) * [cellpose](https://www.cellpose.org/) * [scikit-image](https://scikit-image.org/)
R * [Tidyverse](https://www.tidyverse.org/) * [Dittoseq](https://github.com/dtm2451/dittoSeq) * [ComplexHeatmap](https://github.com/jokergoo/ComplexHeatmap)

Authors

License

See full license here.

Acknowledgements

Questions