coffee19850519 / single_cell_spatial_image

A deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics
MIT License
13 stars 6 forks source link
deep-learning graph-neural-network histological-slides pytorch spatial-transcriptomics

Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning

RESEPT is a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics.

Given inputs as gene expression or RNA velocity, RESEPT learns a three-dimensional embedding with a spatial retained graph neural network from the spatial transcriptomics. The embedding is then visualized by mapping as color channels in an RGB image and segmented with a supervised convolutional neural network model for inferring the tissue architecture accurately.

Documentation: https://resept.readthedocs.io/

System Requirements

Hardware Requirements

RESEPT was trained on a workstation with a 64-core CPU, 20G RAM, and a GPU with 11G VRAM. The function of customizing the segmentation model only can run on GPU device now. Other functions for RESEPT need the minimum requirements of a CPU with 8 cores and 8G RAM.

Software Requirements

OS Requirements

RESEPT can run on Linux. The package has been tested on the following systems:

Python Dependencies

RESEPT mainly depends on the Python (3.6+) scientific stack.

scipy==1.6.2
networkx==2.5.1
opencv_contrib_python==4.5.1.48
tqdm==4.60.0
scikit_image==0.18.1
numpy==1.19.2
umap_learn==0.5.1
six==1.15.0
matplotlib==3.3.4
terminaltables==3.1.0
torch==1.5.0
scanpy==1.7.2
statsmodels==0.12.2
requests==2.25.1
munkres==1.1.4
mmcv_full==1.3.0
rpy2==3.1.0
pandas==1.2.3
numba==0.53.1
seaborn==0.11.1
anndata==0.7.6
cityscapesscripts==2.2.0
leidenalg==0.8.7
Pillow==8.3.1
python_igraph==0.9.6
scikit_learn==0.24.2
umap==0.1.1

Installation Guide

Install dependency packages

  1. Install PyTorch 1.5.0 following the official guide.

  2. Install mmcv-full 1.3.0 by running the following command:

    pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/${CUDA}/torch1.5.0/index.html

    where ${CUDA} should be replaced by the specific CUDA version (cpu, cu92, cu101, cu102).

  3. Install other dependencies:

    pip install -r requirements.txt

    The above steps take 20-25 mins to install all dependencies.

Install RESEPT from GitHub

git clone https://github.com/OSU-BMBL/RESEPT
cd RESEPT

Data preparation

10x Visium data

Annotation file (optional)

An annotation file should include spot barcodes and their corresponding annotations. It is used for evaluating predictive tissue architectures (e.g., ARI) and training user's segmentation models. The file should be named as:[sample_name]_annotation.csv. [example]

Segmentation model file (optinal)

It is a pre-trained segmentation model file in the pth format, which should be provided in predicting the tissue architecture on the generative images.

Data structure

The data schema to run our code is as follows:

[sample_name]/
 |__spatial/
 |    |__tissue_positions_list file
 |    |__scalefactors_json file
 |__gene expression file
 |__annotation file: [sample_name]_annotation.csv (optional)

model/ (optional)
 |__segmentation model file 

The data schema to customize our segmentation model is as follows:

[training_data_folder]
|__[sample_name_1]/
|    |__spatial/
|    |    |__tissue_positions_list file
|    |    |__scalefactors_json file|
|    |__gene expression file
|    |__annotation file: [sample_name_1]_annotation.csv
|__[sample_name_2]/
|    |__spatial/
|    |    |__tissue_positions_list file
|    |    |__scalefactors_json file|
|    |__gene expression file
|    |__annotation file: [sample_name_2]_annotation.csv
|    ...
|__[sample_name_n]/
|    |__spatial/
|    |    |__tissue_positions_list file
|    |    |__scalefactors_json file|
|    |__gene expression file
|    |__annotation file: [sample_name_n]_annotation.csv 

Demo

Function 1: visualize tissue architecture

Run the following command line to construct RGB images based on gene expression from different embedding parameters. For demonstration, please download the example data from here and put the unzip folder '151669' in the source code folder.

wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip 
unzip 151669.zip
python RGB_images_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5  -meta 151669/spatial/tissue_positions_list.csv  -scaler 151669/spatial/scalefactors_json.json -output Demo_result  -embedding scGNN  -transform logcpm 

Command Line Arguments:

Results

RESEPT stores the generative results in the following structure:

      Demo_result/
      |__RGB_images/

This demo takes 25-30 mins to generate all results on the machine with a 64-core CPU.

Function 2: evaluate predictive tissue architectures with annotation

Run the following command line to construct RGB images based on gene expression from different embedding parameters, segment the constructed RGB images to tissue architectures with top-5 Moran's I, and evaluate the tissue architectures (e.g., ARI). For demonstration, please download the example data from here and the pretrained model from here. Then put unzip folders '151669' and 'model_151669' in the source code folder.

wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip 
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip
unzip 151669.zip
unzip model_151669.zip
python evaluation_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5  -meta 151669/spatial/tissue_positions_list.csv  -scaler 151669/spatial/scalefactors_json.json -k 7 -label 151669/151669_annotation.csv -model model_151669/151669_scGNN.pth -output Demo_result_evaluation  -embedding scGNN  -transform logcpm  -device cpu

Command Line Arguments:

Results

RESEPT stores the generated results in the following structure:

      Demo_result_evaluation/
      |__RGB_images/
      |__segmentation_evaluation/
            |__segmentation_map/
            |__top5_evaluation.csv
        |__predicted_tissue_architecture.csv

This demo takes 30-35 mins to generate all results on the machine with a 64-core CPU.

Function 3: predict tissue architecture without annotation

Run the following command line to generate RGB images based on gene expression from different embedding parameters and predict tissue architectures with top-5 Moran's I. For demonstration, please download the example data from here and the pre-trained model from here. Then put unzip folders '151669' and 'model_151669' in the source code folder.

wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip 
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip 
unzip model_151669.zip
unzip 151669.zip
python test_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5  -meta 151669/spatial/tissue_positions_list.csv  -scaler 151669/spatial/scalefactors_json.json -k 7 -model model_151669/151669_scGNN.pth -output Demo_result_tissue_architecture  -embedding scGNN  -transform logcpm -device cpu

Command Line Arguments:

Results

RESEPT stores the generative results in the following structure:

   Demo_result_tissue_architecture/
   |__RGB_images/
   |__segmentation_test/
         |__segmentation_map/
         |__top5_MI_value.csv
     |__predicted_tissue_architecture.csv

This demo takes 30-35 mins to generate all the results on the machine with a 64-core CPU.

Function 4: segment histological images

RESEPT allows to segment a histological image according to predicted tissue architectures. It may help pathologists to focus on specific functional zonation. Run the following command line to predict tissue architectures with top-5 Moran's I and segment the histological image accordingly. For demonstration, please download the example data from here and the pre-trained model from here. Then put unzip folders 'cancer' and 'model_cancer' in the source code folder.

wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/cancer.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_cancer.zip
unzip cancer.zip
unzip model_cancer.zip
python histological_segmentation_pipeline.py -expression ./cancer/Parent_Visium_Human_Glioblas_filtered_feature_bc_matrix.h5 -meta ./cancer/spatial/tissue_positions_list.csv -scaler ./cancer/spatial/scalefactors_json.json -k 7 -model ./model_cancer/cancer_model.pth -histological ./cancer/Parent_Visium_Human_Glioblast.tif -output Demo_result_HistoImage -embedding spaGCN -transform logcpm -device cpu

Command Line Arguments:

This demo takes 30-35 mins to generate all results on the machine with the multi-core CPU.

Function 5: customize segmentation model (GPU required)

RESEPT supports fine-tuning our segmentation model by using users' 10x Visium data. Organize all samples and their annotations according to our pre-defined data schema and download our pre-trained model from here as a training start point. Each sample for the training model should be placed in an individual folder with a specific format (the folder structure can be found here). Then gather all the individual folders into one main folder (e.g., named “training_data_folder”). For demonstration, download the example training data from here, and then run the following command line to generate the RGB images of your own data and customized model.

wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/training_data_folder.zip
unzip model_151669.zip
unzip training_data_folder.zip
python training_pipeline.py -data_folder training_data_folder -output Demo_result_model -embedding scGNN  -transform logcpm -model model_151669/151669_scGNN.pth

Command Line Arguments:

Results

RESEPT stores the generative results in the following structure:

   Demo_result_model/
   |__RGB_images/

   work_dirs/
   |__config/
         |__fine_tune_model.pth

This demo takes about 3-5 hours to generate the model on the machine with 11G VRAM GPU.

Built With

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Citation

if you use RESEPT, please cite our paper:

@article {Chang2021.07.08.451210,
    author = {Chang, Yuzhou and He, Fei and Wang, Juexin and Chen, Shuo and Li, Jingyi and Liu, Jixin and Yu, Yang and Su, Li and Ma, Anjun and Allen, Carter and Lin, Yu and Sun, Shaoli and Liu, Bingqiang and Otero, Jose and Chung, Dongjun and Fu, Hongjun and Li, Zihai and Xu, Dong and Ma, Qin},
    title = {Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning},
    elocation-id = {2021.07.08.451210},
    year = {2021},
    doi = {10.1101/2021.07.08.451210},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.08.451210},
    eprint = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.08.451210.full.pdf},
    journal = {bioRxiv}
}