SatwikAch / SpaceX

0 stars 0 forks source link

SpaceX overview

SpaceXpipeline The SpaceX (spatially dependent gene co-expression network) is a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. These clusters can be cell type specific or based on spatial regions. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency.

The Figure above shows the overall conceptual flow of our pipeline. Panel A is an image of a tissue section from the region of interest. Panel B shows spatial gene expression and biomarkers which are recorded from that tissue section with the help of sequencing techniques. Panel C is the resulting data matrix of gene expression along with spatial locations and cluster annotations on the tissue. All these serve as input for the SpaceX model to obtain the shared (Panel D) and cluster-specific co-expression networks (Panel E). Finally, we use these networks for downstream analysis to detect gene modules and hub genes across spatial regions (Panel F & Panel G respectively) for biological interpretation.

Installation

This package requires a Fortran compiler in order to work. Here are the instructions:

The package requires a dependency that is not available on CRAN. Install it with:

remotes::install_github("rdevito/MSFA")

You can install the released version of SpaceX from (https://github.com/SatwikAch/SpaceX) with:

devtools::install_github("SatwikAch/SpaceX")
library(SpaceX)
#> Loading required package: PQLseq

SpaceX function

Inputs

The first input is Gene_expression_mat which is $N \times G$ dataframe. Here $N$ denotes the number of spatial locations and $G$ denotes number of genes.

The second input is Spatial_locations is a dataframe which contains spatial coordinates.

The third input is Cluster_annotations.

The fourth input is sPMM. If TRUE, the code will return the estimates of sigma1_sq and sigma2_sq from the spatial Poisson mixed model.

The fifth input is Post_process. If FALSE, the code will return the posterior samples of $\Phi$ and $\Psi^c$ (based on definition in equation 1 of the SpaceX paper) only. Default is TRUE and the code will return all the posterior samples, shared and cluster specific co-expressions.

The final input is numCore. The number of requested cores for parallel computing and default is set to be 1.

Output

You will obtain a list of objects as output.

Posterior_samples contains all the posterior samples.

Shared_network provides the shared co-expression matrix (transformed correlation matrix of $G_{s} = \Phi \Phi^{T}$).

Cluster_network provides the cluster specific co-expression matrices (transformed correlation matrices of $G_{c} = \Phi \Phi^{T} + \Psi^{c} {\Psi^{c^{T}}}$).

Example

An example code with the breast cancer data to demonstrate how to run the SpaceX function and obtain shared and cluster specific networks.

## Reading the Breast cancer data

## Spatial locations
head(BC_loc)

## Gene expression for data
head(BC_count) 

## Data processing
G <-dim(BC_count)[2] ## number of genes
N <-dim(BC_count)[1] ## number of locations

## Application to SpaceX algorithm (Please make sure to request for large enough memory to work with the posterior samples)
BC_fit <- SpaceX(BC_count,BC_loc[,1:2],BC_loc[,3],sPMM=FALSE,Post_process = TRUE,numCore = 2)

## Shared_network :: Shared co-expression matrix
## Cluster_network :: Cluster specific co-expression matrices

Tutorial website

The tutorial website can be found here.

Paper

Satwik Acharyya, Xiang Zhou and Veerabhadran Baladandayuthapani (2022). SpaceX: Gene Co-expression Network Estimation for Spatial Transcriptomics. Bioinformatics, 38(22): 5033–5041.

Supplementary file

Supplementary

Points to note