YangLabHKUST / SpatialScope

A unified approach for integrating spatial and single-cell transcriptomics data by leveraging deep generative models
https://spatialscope-tutorial.readthedocs.io/en/latest/
GNU General Public License v3.0
54 stars 6 forks source link

SpatialScope

A unified approach for integrating spatial and single-cell transcriptomics data by leveraging deep generative models

SpatialScope

Visit our documentation for installation, tutorials, examples and more.

Installation

$ git clone https://github.com/YangLabHKUST/SpatialScope.git
$ cd SpatialScope
$ conda env create -f environment.yml
$ conda activate SpatialScope
# fix bug of squidpy, locate the lib with `which python`
$ rsync ./src/_feature_mixin.py ~/.conda/envs/SpatialScope/lib/python3.9/site-packages/squidpy/im/_feature_mixin.py

check the installation status

$ python ./src/Cell_Type_Identification.py -h

Installation using Docker

If the installation is unsuccessful, you can consider using docker instead. Pull SpatialScope docker image from dockerhub, make sure docker and nvidia-container-toolkit have been installed first.

$ docker pull xiaojs95/spatialscope
$ docker images

Usage

$ docker run -it --gpus all --ipc=host xiaojs95/spatialscope /bin/bash

update repository if necessary

$ git pull

check the installation status

$ python ./src/Cell_Type_Identification.py -h

Reproducibility

We provide source codes for reproducing the SpatialScope analysis in the main text in the demos directory.

All relevent materials involved in the reproducing codes are availabel from here

Quick start for Visium data

We illustrate the usage of SpatialScope using a single slice of 10x Visium human heart data:

All relevent materials involved in the following example are availabel from here

Step1: Nuclei segmentation

python ./src/Nuclei_Segmentation.py --tissue heart --out_dir  ./output  --ST_Data ./demo_data/V1_Human_Heart_spatial.h5ad --Img_Data  ./demo_data/V1_Human_Heart_image.tif

Input:

This step will take about 5 mins and make ./output/heart directory, and generate two files:

Step2: Cell type identification

python ./src/Cell_Type_Identification.py --tissue heart --out_dir  ./output  --ST_Data ./output/heart/sp_adata_ns.h5ad --SC_Data ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad --cell_class_column cell_type

Input:

This step will take about 10 mins and generate three files:

Now we can use the sp_adata.h5ad to visualize the single-cell resolution spatial distribution of different cell types:

ad_sp = sc.read('./output/heart/sp_adata.h5ad')
fig, ax = plt.subplots(1,1,figsize=(12, 8),dpi=100)
PlotVisiumCells(ad_sp,"discrete_label_ct",size=0.3,alpha_img=0.3,lw=0.8,ax=ax)

SpatialScope more details are available in jupyter notebook Human Heart (Visium, a single slice).

Step3: Gene expression decomposition

In Step3, by conditioning on the inferred cell type labels from Step2, SpatialScope performs gene expression decomposition, transforming the spot-level gene expression profile into single-cell resolution. To do this, we first learn a score-based generative model to approximate the expression distribution of different cell types from the single-cell reference data. Then we use the learned model to decompose gene expression from the spot level to the single-cell level, while accounting for the batch effect between single-cell reference and ST data.

python ./src/Decomposition.py --tissue heart --out_dir  ./output --SC_Data ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad --cell_class_column cell_type  --ckpt_path ./Ckpts_scRefs/Heart_D2/model_5000.pt --spot_range 0,100 --gpu 0,1,2,3

Input:

This step will take about 10 mins and generate one file:

Learning the gene expression distribution of scRNA-seq reference using score-based model

The scRNA-seq reference ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad was preprocessed following the standard precedures, more details are available in jupyter notebook Human Heart (Visium, a single slice). In order to make the distribution learning process more efficient, we only learned the gene expression distributions of 2,000 selected highly variable genes. Besides, we subsampled the number of cells per cell type, up to a maximum of 3,000.

We use four RTX 2080 Ti GPUs to train scRNA-seq reference in parallel.

python ./src/Train_scRef.py \
--ckpt_path ./Ckpts_scRefs/Heart_D2 \
--scRef ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad \
--cell_class_column cell_type \
--gpus 0,1,2,3 \
--sigma_begin 50 --sigma_end 0.002 --step_lr 3e-7 

The checkpoints and sampled psuedo-cells will be saved in ./Ckpts_scRefs/Heart_D2, e.g, model_5000.pt, model_5000.h5ad. The pre-trained checkpoint can be used for any spatial data from the same tissue.

Due to the low sequencing depth (~2000 UMIs per cell) of this Human Heart scRNA-seq reference, we changed the default parameters of sigma_begin, sigma_end and step_lr.

As the sampling process of diffusion/score-based models requires hundreds to thousands of network evaluations to emulate a continuous process, the entire training process takes approximately 40 hours on four RTX 2080 Ti GPUs. Therefore, we are trying to accelarate the training process with some new technologies in the field of diffusion model, such as stable diffusion.

Conveniently, we provided the pre-trained checkpoint (Ckpts_scRefs/Heart_D2/model_5000.pt) in here, so you can skip this part.

Frequently Asked Questions

  1. I have access to a 3090 alternatively 2x V100-SXM2. Will that work for imputing onto a 200,000 cell MERFISH dataset?

    Answer: The minimum GPU requirement for SpatialScope is 2080 Ti. However, limited by GPU memory, we recommend impute 1000 cells at a time, more details are availabel in demo notebook Mouse MOp (MERFISH).

Contact information

Please contact Xiaomeng Wan (xwanaf@connect.ust.hk), Jiashun Xiao (jxiaoae@connect.ust.hk) or Prof. Can Yang (macyang@ust.hk) if any enquiry.