Closed rocketeer1998 closed 6 months ago
@rocketeer1998 Hi Jiayu, Please check this turorial: https://songdongyuan1994.github.io/scDesign3/docs/articles/scDesign3-DEanalysis-vignette.html
In this paper, another group independently uses scDesign3 for generating spatially variable/invariable genes.
Best, Dongyuan
Thanks for your quick response! How did you pre-select the top spatial variable genes? I see that you chose top 200 genes to save time, on what basis?
I've followed your instructions on above mentioned tutorial. But it seems like there is a memory error when I run in fit copula step. Below is my code and session info.
scDesign3_simulator <- function(h5ad_path,dataset_name){
t1 <- Sys.time()
### read matrix
anndata <- read_h5ad(h5ad_path, backed = 'r')
mx <- anndata$X
mx <- t(mx) # row as gene
mx <- Matrix(mx, sparse = T)
### read coord
coord <- as.data.frame(anndata$obsm$spatial[,1:2])
colnames(coord) <- c('X', 'Y')
### create SCE object
example_sce <- SingleCellExperiment(list(counts = mx),
colData = DataFrame(spatial1 = coord$X, spatial2 = coord$Y))
### read pre-selected SVG and nSVG results to filter genes to speed up
svg_nsvg <- read.csv('Results/8-Simulate_SVG_nSVG/SVG_nSVG.csv') %>% filter(dataset == dataset_name)
example_sce <- example_sce[svg_nsvg$gene,]
### remove mitochondrial genes
mt_idx<- grep("mt-",rownames(example_sce))
if(length(mt_idx)!=0){
example_sce <- example_sce[-mt_idx,]
}
### scDesign3 step-by-step simulation
set.seed(1)
### construct input dataset
print(Sys.time())
print("Step1 construct input dataset")
example_data <- construct_data(
sce = example_sce,
assay_use = "counts",
celltype = NULL,
pseudotime = NULL,
spatial = c("spatial1", "spatial2"),
other_covariates = NULL,
corr_by = "1"
)
### fit regression models for each feature based on specific specification
print(Sys.time())
print("Step2 fit regression models for each feature based on specific specification")
example_marginal <- fit_marginal(
data = example_data,
predictor = "gene",
mu_formula = "s(spatial1, spatial2, bs = 'gp', k= 50)",
sigma_formula = "1",
family_use = "nb",
n_cores = 20,
usebam = FALSE
)
### fit a copulam obtain AIC and BIC
print(Sys.time())
print("Step3 fit a copulam obtain AIC and BIC")
example_copula <- fit_copula(
sce = example_sce,
assay_use = "counts",
marginal_list = example_marginal,
family_use = "nb",
copula = "gaussian",
n_cores = 20,
input_data = example_data$dat
)
### extract parameters
print(Sys.time())
print("Step4 extract parameters")
example_para <- extract_para(
sce = example_sce,
marginal_list = example_marginal,
n_cores = 20,
family_use = "nb",
new_covariate = example_data$newCovariate,
data = example_data$dat
)
### select SVG and non SVG
dev_explain <- sapply(example_marginal, function(x){
sum = summary(x$fit)
return(sum$dev.expl)
})
dev_ordered <- order(dev_explain, decreasing = TRUE)
num_de <- 100
ordered <- dev_explain[dev_ordered]
de_idx <- names(ordered)[1:num_de]
non_de_idx <- names(ordered)[-(1:num_de)]
non_de_mat <- apply(example_para$mean_mat[,non_de_idx], 2, function(x){
avg <- (max(x)+min(x))/2
new_mean <- rep(avg, length(x))
return(new_mean)
})
example_para$mean_mat[,non_de_idx] <- non_de_mat
### simulate new data
set.seed(1)
example_newcount <- simu_new(
sce = example_sce,
mean_mat = example_para$mean_mat,
sigma_mat = example_para$sigma_mat,
zero_mat = example_para$zero_mat,
quantile_mat = NULL,
copula_list = example_copula$copula_list,
n_cores = 1,
family_use = "nb",
input_data = example_data$dat,
new_covariate = example_data$newCovariate,
important_feature = rep(TRUE, dim(example_sce)[1]),
filtered_gene = example_data$filtered_gene
)
simu_sce <- SingleCellExperiment(list(counts = example_newcount), colData = example_data$newCovariate)
logcounts(simu_sce) <- log1p(counts(simu_sce))
t2 <- Sys.time()
return(list(SVG_index = de_idx, nSVG_index = non_de_idx, new_data = simu_sce, time = t2-t1))
}
result1 <- scDesign3_simulator(h5ad_path = MyData.h5ad',
dataset_name = 'MyData')
When I ran fit_copula
, it threw me an error Error in serialize(data, node$con) : 链结书写发生了错误
. I've tested my pipeline in a small datasets and it worked. This dataset has ~2700 spots and ~12000 genes. Is it a computer memory problem?
Hi Jiayu,
That sounds like a run-out-of-memory issue. The copula fitting can be space expensive since it will estimate a gene x gene corr matrix. One solution is to give up the copula part if you only care about marginal SVG, e.g., set corr_by = 'ind'
in construct_data()
.
I've made up my mind to upgrade my PC to a Linux server! Thanks a lot!
Hi @SONGDONGYUAN1994 @qw130 , thanks for your contribution to scDesign3! I have an interesting question to discuss: Do you have any ideas for generating spatially invariant genes when simulating new spatial datasets? This is helpful to benchmark new spatially variable genes (SVGs) algorithms against SVGs detection accuracy after running each software and calculating AUC curve.