Closed LucasSilvaFerreira closed 3 years ago
Hi,
It would be helpful if you included a reproducible example, but I can handle it time time around :)
Please see the code below, where I compute example p-values using run_sceptre_in_memory
and run_sceptre_gRNA_gene_pair
. You will see that the p-values produced by these two functions coincide exactly.
# download sceptre
devtools::install_github("katsevich-lab/sceptre")
# load packages
library(dplyr)
library(sceptre)
# generate a small, random dataset
set.seed(4)
n_genes <- 40
n_gRNAs <- 50
n_cells <- 5000
# gene expression matrix
expression_matrix <- replicate(n = n_genes, rpois(n_cells, 1)) %>% t()
# perturbation matrix
perturbation_matrix <- replicate(n = n_gRNAs, rbinom(n_cells, 1, 0.05)) %>% t()
# assign row names
row.names(expression_matrix) <- paste0("gene", seq(1, nrow(expression_matrix)))
row.names(perturbation_matrix) <- paste0("gRNA", seq(1, nrow(perturbation_matrix)))
# simulate batch
batch <- sample(x = c("batch1", "batch2"), size = n_cells, replace = TRUE, prob = c(0.5, 0.5))
# compute log mRNA library size, construct covariate matrix
lg_mRNA_lib_size <- log(colSums(expression_matrix))
# remove cells with 0 expression (if any exist)
zero_idxs <- lg_mRNA_lib_size == 0
lg_mRNA_lib_size <- lg_mRNA_lib_size[!zero_idxs]
expression_matrix <- expression_matrix[,!zero_idxs]
batch <- batch[!zero_idxs]
# construct covariate matrix
covariate_matrix <- data.frame(lg_mRNA_lib_size = lg_mRNA_lib_size,
batch = factor(batch))
gene_gRNA_pairs <- expand.grid(gene_id = row.names(expression_matrix),
gRNA_id = row.names(perturbation_matrix)) %>% slice_sample(n = 10)
# set a couple additional arguments
storage_dir <- tempdir()
pod_sizes <- c(gene = 2, gRNA = 2, pair = 3)
# run scale function
result <- run_sceptre_in_memory(storage_dir = storage_dir,
expression_matrix = expression_matrix,
perturbation_matrix = perturbation_matrix,
covariate_matrix = covariate_matrix,
gene_gRNA_pairs = gene_gRNA_pairs,
side = "left",
pod_sizes = pod_sizes,
regularization_amount = 0,
seed = 1,
B = 1000)
scale_p_vals <- result$p_value
# call the demo function, looping over the pairs
demo_p_vals <- sapply(seq(1L, nrow(gene_gRNA_pairs)), function(i) {
curr_gene <- as.character(gene_gRNA_pairs[[i, "gene_id"]])
curr_gRNA <- as.character(gene_gRNA_pairs[[i, "gRNA_id"]])
curr_expressions <- expression_matrix[curr_gene,]
curr_indicators <- perturbation_matrix[curr_gRNA,]
res <- run_sceptre_gRNA_gene_pair(expressions = curr_expressions,
gRNA_indicators = curr_indicators,
covariate_matrix = covariate_matrix,
side = "left",
B = 1000,
seed = 1,
reduced_output = TRUE)
return(res$p_value)
})
# verify p-values coincide
identical(scale_p_vals, demo_p_vals)
Are you certain that the arguments that you are passing to these functions are identical? For example, as noted in the documentation,run_sceptre_gRNA_gene_pair
does not apply any regularization, thereby implicitly setting regularization_amount
to zero. By contrast, the regularization_amount
argument in run_sceptre_in_memory
defaults to 0.1.
Hi Tim, I had a nice result when comparing a simulated example using side='left'. I tried to run the same comparison using the side parameter in the run_sceptre_in_memory = "both" , but the results are diverging.
I did notice that your side parameters is hardcoded to 'left' in your run_sceptre_in_memory implementation.
foreach(pod_id = seq(1, dicts[["pairs"]])) %dopar% { run_gRNA_gene_pair_analysis_at_scale(pod_id = pod_id, gene_precomp_dir = dirs[["gene_precomp_dir"]], gRNA_precomp_dir = dirs[["gRNA_precomp_dir"]], results_dir = dirs[["results_dir"]], log_dir = dirs[["log_dir"]], expression_matrix = expression_matrix, perturbation_matrix = perturbation_matrix, covariate_matrix = covariate_matrix, regularization_amount = regularization_amount, side = "left", seed = seed, B = B) }
results <- collect_results(results_dir = dirs[["results_dir"]]) return(results) }
Thanks, Lucas. This indeed was a bug. I fixed it. You should be able to download and install the latest version of the package from Github.
Thanks
Hi, I used some data simulated by the Moderately-sized analysis tutorial to run in the run_sceptre_gRNA_gene_pair. To my surprise the results of the function run_sceptre_in_memory is totally uncorrelated with the results generated by run_sceptre_gRNA_gene_pair. Even using the same seeds.
Should I expect some difference in these results?