A unified approach for integrating spatial and single-cell transcriptomics data by leveraging deep generative models
Visit our documentation for installation, tutorials, examples and more.
$ git clone https://github.com/YangLabHKUST/SpatialScope.git
$ cd SpatialScope
$ conda env create -f environment.yml
$ conda activate SpatialScope
# fix bug of squidpy, locate the lib with `which python`
$ rsync ./src/_feature_mixin.py ~/.conda/envs/SpatialScope/lib/python3.9/site-packages/squidpy/im/_feature_mixin.py
check the installation status
$ python ./src/Cell_Type_Identification.py -h
If the installation is unsuccessful, you can consider using docker instead. Pull SpatialScope docker image from dockerhub, make sure docker and nvidia-container-toolkit have been installed first.
$ docker pull xiaojs95/spatialscope
$ docker images
Usage
$ docker run -it --gpus all --ipc=host xiaojs95/spatialscope /bin/bash
update repository if necessary
$ git pull
check the installation status
$ python ./src/Cell_Type_Identification.py -h
We provide source codes for reproducing the SpatialScope analysis in the main text in the demos
directory.
All relevent materials involved in the reproducing codes are availabel from here
We illustrate the usage of SpatialScope using a single slice of 10x Visium human heart data:
All relevent materials involved in the following example are availabel from here
python ./src/Nuclei_Segmentation.py --tissue heart --out_dir ./output --ST_Data ./demo_data/V1_Human_Heart_spatial.h5ad --Img_Data ./demo_data/V1_Human_Heart_image.tif
Input:
This step will take about 5 mins and make ./output/heart
directory, and generate two files:
python ./src/Cell_Type_Identification.py --tissue heart --out_dir ./output --ST_Data ./output/heart/sp_adata_ns.h5ad --SC_Data ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad --cell_class_column cell_type
Input:
This step will take about 10 mins and generate three files:
Now we can use the sp_adata.h5ad
to visualize the single-cell resolution spatial distribution of different cell types:
ad_sp = sc.read('./output/heart/sp_adata.h5ad')
fig, ax = plt.subplots(1,1,figsize=(12, 8),dpi=100)
PlotVisiumCells(ad_sp,"discrete_label_ct",size=0.3,alpha_img=0.3,lw=0.8,ax=ax)
more details are available in jupyter notebook Human Heart (Visium, a single slice).
In Step3, by conditioning on the inferred cell type labels from Step2, SpatialScope performs gene expression decomposition, transforming the spot-level gene expression profile into single-cell resolution. To do this, we first learn a score-based generative model to approximate the expression distribution of different cell types from the single-cell reference data. Then we use the learned model to decompose gene expression from the spot level to the single-cell level, while accounting for the batch effect between single-cell reference and ST data.
python ./src/Decomposition.py --tissue heart --out_dir ./output --SC_Data ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad --cell_class_column cell_type --ckpt_path ./Ckpts_scRefs/Heart_D2/model_5000.pt --spot_range 0,100 --gpu 0,1,2,3
Input:
This step will take about 10 mins and generate one file:
The scRNA-seq reference ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad
was preprocessed following the standard precedures, more details are available in jupyter notebook Human Heart (Visium, a single slice). In order to make the distribution learning process more efficient, we only learned the gene expression distributions of 2,000 selected highly variable genes. Besides, we subsampled the number of cells per cell type, up to a maximum of 3,000.
We use four RTX 2080 Ti GPUs to train scRNA-seq reference in parallel.
python ./src/Train_scRef.py \
--ckpt_path ./Ckpts_scRefs/Heart_D2 \
--scRef ./Ckpts_scRefs/Heart_D2/Ref_Heart_sanger_D2.h5ad \
--cell_class_column cell_type \
--gpus 0,1,2,3 \
--sigma_begin 50 --sigma_end 0.002 --step_lr 3e-7
The checkpoints and sampled psuedo-cells will be saved in ./Ckpts_scRefs/Heart_D2
, e.g, model_5000.pt, model_5000.h5ad. The pre-trained checkpoint can be used for any spatial data from the same tissue.
Due to the low sequencing depth (~2000 UMIs per cell) of this Human Heart scRNA-seq reference, we changed the default parameters of sigma_begin, sigma_end and step_lr.
As the sampling process of diffusion/score-based models requires hundreds to thousands of network evaluations to emulate a continuous process, the entire training process takes approximately 40 hours on four RTX 2080 Ti GPUs. Therefore, we are trying to accelarate the training process with some new technologies in the field of diffusion model, such as stable diffusion.
Conveniently, we provided the pre-trained checkpoint (Ckpts_scRefs/Heart_D2/model_5000.pt) in here, so you can skip this part.
I have access to a 3090 alternatively 2x V100-SXM2. Will that work for imputing onto a 200,000 cell MERFISH dataset?
Answer: The minimum GPU requirement for SpatialScope is 2080 Ti. However, limited by GPU memory, we recommend impute 1000 cells at a time, more details are availabel in demo notebook Mouse MOp (MERFISH).
Please contact Xiaomeng Wan (xwanaf@connect.ust.hk), Jiashun Xiao (jxiaoae@connect.ust.hk) or Prof. Can Yang (macyang@ust.hk) if any enquiry.