Closed redbybeing closed 8 months ago
Hi Jiseok,
It is good to hear from you, and I hope you had a good break. :)
This is something that I have noticed myself recently. How many negative control / discovery pairs are you trying to test? Unfortunately, fit_parametric_curve = FALSE
starts to break down when the number of pairs exceeds ~50,000. You might consider reducing the number of pairs that you are testing and setting fit_parametric_curve = FALSE
.
Hi Tim,
Hope you had a good break too! Best wishes for the new year.
I have 16,064 negative control pairs and 685,353 discovery pairs (after pairwise QC).
I tried running just up to run_calibration_check
to see whether fit_parametric_curve = FALSE
will improve negative control calibration, but it fails there.
Jiseok
Hm, I see. That is a bit surprising. What happens when you reduce the number of negative control pairs to (say) 100? Note that you can set this parameter in run_calibration_check
.
I just tried with n_calibration_pairs = 100
(local mac) and it still failed🤔
I think precisely at this step:
Got it. Thanks for pointing this out. It seems to be a bug. I still have your data and analysis script; I'll try to reproduce the bug and get back to you!
Thanks Tim. Please note that the data I gave you before has less than half cells (~30k) of my current data (~80k).
Hi Jiseok,
I believe that you sent me an updated version of your data. This is what the sceptre_object looks like:
Attributes of the data:
• 88183 cells
• 32285 responses
• High multiplicity-of-infection
• 84 targeting gRNAs (distributed across 42 targets)
• 1 non-targeting gRNAs
• 6 covariates (batch, grna_n_nonzero, grna_n_umis, response_n_nonzero, response_n_umis, response_p_mito)
Is this correct?
Oh that's great! Yes. Note: 'batch' in covariates could be 'batch_num' instead, but I'm not sure (just a naming from my original seurat object)
OK, nice.
I tried running the code that you previously sent to me, setting parallel = FALSE
in set_analysis_parameters()
. (Hopefully you don't mind my pasting your code here.) Are you able to run the following without issue? It worked for me. Note that data_dir
will have to be updated on your machine.
options(stringsAsFactors = F)
library(sceptre)
packageVersion('sceptre') # Latest version 0.9.2
# References:
# https://katsevich-lab.github.io/sceptre/articles/sceptre.html
# https://timothy-barry.github.io/sceptre-book/
# https://timothy-barry.github.io/sceptre-book/import-data.html#import-data-from-a-collection-of-r-objects
# Directory to save files
data_dir <- "/Users/timbarry/research_offsite/external/user-data/"
# 1) Import data R matrices
gex <- readRDS(paste0(data_dir, "A04_gex_SingleR_anno_50k.rds"))
# 1. response_matrix
response_matrix <- gex@assays$RNA$counts
dim(response_matrix)
# 2. grna_matrix
grna_matrix <- gex@assays$crispr$counts
dim(grna_matrix)
# 3. extra_covariates
extra_covariates <- gex@meta.data[,"batch_num", drop = FALSE] |> dplyr::rename("batch" = "batch_num")
head(extra_covariates)
# 4. grna_target_data_frame
gRNAs <- gex@assays$crispr@counts@Dimnames[[1]]
gRNAtargets <- gsub('.{2}$', '', gRNAs)
gRNAtargets[gRNAtargets == "Setd8"] <- "Kmt5a" # Target name should match the name in the response matrix
gRNAtargets[gRNAtargets == "Whsc1l1"] <- "Nsd3" # Target name should match the name in the response matrix
grna_target_data_frame <- data.frame(cbind(gRNAs, gRNAtargets))
colnames(grna_target_data_frame) <- c("grna_id", "grna_target")
grna_target_data_frame[grna_target_data_frame$grna_id=="GFP-1", "grna_target"] <- "non-targeting"
head(grna_target_data_frame)
# 5. moi
moi <- "high"
# 6. clear stuff
rm(gex,gRNAs,gRNAtargets); gc()
# Setup sceptre object
sceptre_object <- import_data(
response_matrix = response_matrix,
grna_matrix = grna_matrix,
grna_target_data_frame = grna_target_data_frame,
moi = moi,
extra_covariates = extra_covariates
)
gc()
print(sceptre_object)
# 2) Set analysis parameters
# 1. Positive control pairs
positive_control_pairs <- construct_positive_control_pairs(
sceptre_object = sceptre_object
)
head(positive_control_pairs)
# 2. Discovery pairs
discovery_pairs <- construct_trans_pairs(
sceptre_object = sceptre_object,
pairs_to_exclude = "none",
) |> dplyr::sample_n(5000) # sample 5000 pairs for now
head(discovery_pairs)
# 3. set analysis parameters
sceptre_object <- set_analysis_parameters(
sceptre_object = sceptre_object,
positive_control_pairs = positive_control_pairs,
discovery_pairs = discovery_pairs,
fit_parametric_curve = FALSE,
resampling_mechanism = "permutations"
)
print(sceptre_object)
# 3) Assign gRNAs
sceptre_object <- assign_grnas(
sceptre_object = sceptre_object,
method = "thresholding",
threshold = 3
)
print(sceptre_object)
plot(sceptre_object)
# 4) Run quality control
plot_covariates(sceptre_object) # p-mito appears to already be clipped at 0.1; it would be best to do QC within sceptre
sceptre_object <- run_qc(
sceptre_object = sceptre_object
)
print(sceptre_object)
plot(sceptre_object)
# 5) Calibration check
sceptre_object <- run_calibration_check(
sceptre_object = sceptre_object,
n_calibration_pairs = 100, parallel = TRUE, n_processors = 2
)
print(sceptre_object)
(This is a slightly modified version of your code.)
Oh wow, yes it ran quick without an error on my mac! Then will it run well with the original ~16k negative control pairs and ~685k discovery pairs?
That's great to hear.
But do we have a sense as to why you were encountering an error originally? Can you tell if the code that you were using is different in some important way than the code that I pasted?
I would not expect the fit_parametric_curve = FALSE
method to work beyond ~10,000 - 50,000 pairs. Is there a way to prioritize your discovery pairs such that you can analyze the ~10,000 most important pairs?
Ah, I think I found out-
Here, only after I added |> dplyr::sample_n(5000) # sample 5000 pairs for now
, the code ran all the way to calibration check. So even if I don't actually run discovery analysis, I should reduce the discovery pairs in order for calibration check to run successfully?
discovery_pairs <- construct_trans_pairs(
sceptre_object = sceptre_object,
positive_control_pairs = positive_control_pairs,
pairs_to_exclude = "none"
) |> dplyr::sample_n(5000) # sample 5000 pairs for now
Well as of now, I am doing sort of large-scale blind trans-analysis and I don't have a way to prioritize my discovery pairs down to ~10000. My original concern was severe miscalibration of negative control pairs:
N negative control pairs called as significant: 1017/17627
So I wanted to try setting fit_parametric_curve = FALSE
to improve the calibration. If it won't work for too many pairs, I might have to just accept my data as is..?
So even if I don't actually run discovery analysis, I should reduce the discovery pairs in order for calibration check to run successfully?
Got it. That would be ideal, yes.
If it won't work for too many pairs, I might have to just accept my data as is..?
Unfortunately, I'm not quite sure what is causing the miscalibration that we are observing here. If I recall correctly, the calibration on your smaller dataset was very good. I do think it would be a good idea to continue going through the steps for improving calibration outlined in the book. If calibration does not improve, it may be possible to empirically select the FDR threshold using the negative control p-values.
Thank you very much Tim.
For now I will keep in mind that fit_parametric_curve = FALSE
works for smaller number of pairs, and I will try the other steps to improve calibration.
Best,
Jiseok
Hello! I'm back again :)
After previous successful runs of
sceptre
both in local mac and cluster (withparallel=FALSE
), I tried changing a parameter to improve my negative control calibration. The only thing I changed isfit_parametric_curve = TRUE
toFALSE
.resampling_mechanism
was set topermutations
and I didn't change that. After this change, local mac RStudio keeps failing at therun_calibration_check
step. So I tried on cluster (withparallel=FALSE
and 100 gb memory) and it still fails. Does settingfit_parametric_curve=FALSE
cause a big difference in running? I kept all other conditions same and it had worked well few weeks ago.Best, Jiseok