warnings: the package in Zenodo is no longer updated, please install the newest version!
A method that accurately deconvolute bulk tissue RNA-seq into single cell-type resolution given the knowledge gained from scRNA-seq. ENIGMA applies a matrix completion strategy to minimize the distance between mixture transcriptome and weighted combination of cell-type-specific expression, allowing quantification of cell-type proportions and reconstruction of cell-type-specific transcriptome.
our newest version of ENIGMA could be downloaded through following step!
install.packages(c("Matrix","S4Vectors","corpcor","MASS","e1071","ggplot2","cowplot","magrittr","purrr","tibble","nnls","doParallel","tidyr","plyr","vctrs","matrixStats"))
BiocManager::install(c("SingleCellExperiment","scater","Biobase","SummarizedExperiment","sva","preprocessCore"))
install the newest version of ENIGMA
devtools::install_github("WWXKenmo/ENIGMA_test")
updated stop criteria
Build FindCSE_DEG function to perform CTS-DEG analysis
DEG = FindCSE_DEG(object,y)
# object: an ENIGMA object
# y: a binary phenotype vector represents case(1) and control(0)
please refer to the CTS-DE document of detailed guidence of CTS-DE analysis with ENIGMA. link to example datasets
Build GeneSigTest function to filter the genes ENIGMA now provide a function to help user to identify the genes which could be accurately estimated through our algorithm.
res = GeneSigTest(object,filtering=TRUE)
head(res$call)
head(res$pval)
egm = res$egm # the filtered ENIGMA object
we have implement the ENIGMA algorithm in python for those people who want to use ENIGMA in python version
Please refer to the document of ENIGMA for detailed guidence using ENIGMA as a R package. link to example datasets
Which model users should use and why? In summary, both trace norm and maximum L2 norm models show superior performance at different aspects. First, trace norm model poses trace norm regularizer to inferred CSE profiles, and uses low-rank matrix to approximate cell type-specific gene expression, which may help the model to discover better gene variation across samples. Trace norm could also perform better than maximum L2 norm on CTS-DEG identification. Second, maximum L2 norm has assumed that there exist unknown variables (expression of rare cell types or technique variations) in bulk samples, and maximum L2 norm shows better performance on recovering cell type-specific correlation structure even there exists very strong noise in observed bulk expression matrix. So, choosing which model is dependent on what kind of analyses users want to conduct. When users want to define patients/samples subtypes according to cell type-specific gene expression profile (e.g. malignant cell), users could choose the maximum L2 norm model to perform the deconvolution. Besides, when users want to perform cell type-specific analysis of differentially expressed genes, users could choose the trace norm model to perform the deconvolution. Maximum L2 norm is also preferable if users have a large cohort of bulk samples. Finally, the training of maximum L2 norm model is not involved with any inverse matrix calculation or singular value decomposition, so it is very scalable to the large bulk samples. When users want to perform fast deconvolution on the bulk expression dataset with large sample sizes, we suggest to use maximum L2 norm model.
Author: Weixu Wang, Xiaolan Zhou, Dr. Jun Yao, Prof. Ting Ni
Report bugs by opening a new issue on this Github page
Provide suggestions by sending email to maintainer!
Maintainer: Weixu Wang (ken71198@hotmail.com)
Wang W, Yao J, Wang Y, et al. Improved estimation of cell type-specific gene expression through deconvolution of bulk tissues with matrix completion[J]. bioRxiv, 2021.