Closed cnk113 closed 2 years ago
Actually I was wondering what to do with the non targeting guides? I think it may be the issue since its in the gene pairs and perturbation matrix, but it's not in the expression matrix?
Hello,
Which tutorial are you looking at? The "Running sceptre on moderately sized data" tutorial?
Yes that is the tutorial I'm following.
OK, thanks. I am trying to understand your question a bit better. The functionrun_sceptre_in_memory
takes two matrices as arguments: expression_matrix
and perturbation_matrix
. expression_matrix
is a gene-by-cell expression matrix, and perturbation_matrix
is a perturbation-by-cell matrix. By default, the function computes p-values for all perturbation-gene pairs, including the negative control perturbations.
So, to answer your question, you just treat the negative control perturbations as you would any other perturbation when you call the function. We recommend using negative control/gene pairs to assess calibration of the method. Does this answer your question?
I see, that makes sense. However I still am running into the initial error.
Could you please provide more information about the matrices that you are passing to the function? For example, could you call str
and class
on each matrix and print the results? It is hard for me to provide a solution without more information.
class(expr_matrix)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> class(perturb_matrix)
[1] "matrix" "array"
> class(covariate_matrix)
[1] "data.frame"
> class(gene_pair)
[1] "data.frame"
str(expr_matrix)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:92466284] 11 22 34 35 48 49 56 57 59 65 ...
..@ p : int [1:15285] 0 6774 13562 20119 27497 34539 41853 48364 55231 61690 ...
..@ Dim : int [1:2] 29506 15284
..@ Dimnames:List of 2
.. ..$ : chr [1:29506] "MIR1302-2HG" "FAM138A" "ENSG00000238009" "ENSG00000239945" ...
.. ..$ : chr [1:15284] "AAACCCAAGGGATCGT_1" "AAACCCAAGTCAGGGT_1" "AAACCCACAATCAGCT_1" "AAACCCAGTACCCGCA_1" ...
..@ x : num [1:92466284] 1 1 1 1 1 1 6 1 5 1 ...
..@ factors : list()
> str(perturb_matrix)
num [1:787, 1:15284] 1 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:787] "C5orf46-3" "PXMP4-2" "COL18A1-3" "CCBE1-3" ...
..$ : chr [1:15284] "AAACCCAAGGGATCGT_1" "AAACCCAAGTCAGGGT_1" "AAACCCACAATCAGCT_1" "AAACCCAGTACCCGCA_1" ...
> str(covariate_matrix)
'data.frame': 15284 obs. of 2 variables:
$ opl33.nCount_RNA: num 70513 52782 85677 37766 43078 ...
$ batch : Factor w/ 2 levels "SL268","SL269": 1 1 1 1 1 1 1 1 1 1 ...
> str(gene_pair)
'data.frame': 15284 obs. of 2 variables:
$ gene_id: chr "C5orf46" "PXMP4" "COL18A1" "CCBE1" ...
$ gRNA_id: chr "C5orf46-3" "PXMP4-2" "COL18A1-3" "CCBE1-3" ...
Thank you. I believe that I have identified and fixed the bug. Could you please re-download package and try again?
As a side note, if you are including the gene or gRNA library size in the covariate matrix (which we recommend), it's best to log-transform these library sizes.
I updated sceptre but I'm still getting the same error.
Would you be able somehow to share the data (privately) so that I can take a closer look?
Yeah I can share it on Google drive or Microsoft onedrive. What's your email?
Best, Chang
Thank you! I think that we will be able to get this sorted out.
Email: tbarry2@andrew.cmu.edu
I received your email, thanks. I will get back to you in a few days.
Hi Chang,
We just released a major update to the software. I'd encourage you to take a look at the new (and much simpler) tutorial and have another go.
That being said, I identified a few issues with the data that you sent me:
gene_gRNA_pairs
are not part of your gene_matrix
.gene_gRNA_pairs
are duplicated.I tried my best to fix these issues and analyze your data. Please see the below script:
library(Matrix)
library(dplyr)
library(magrittr)
library(sceptre)
ex <- readRDS("/Users/timbarry/research_offsite/external/private_screens/sceptre.rds")
gene_matrix <- ex[[1]]
gRNA_matrix <- ex[[2]]
covariate_matrix <- ex[[3]] %>% dplyr::rename("lg_gRNA_ct" = "opl33.nCount_RNA") %>%
mutate(lg_gene_ct = log(colSums(gene_matrix)))
gene_gRNA_pairs <- ex[[4]] %>% dplyr::filter(gene_id %in% row.names(gene_matrix)) %>%
distinct()
result <- run_sceptre_high_moi(gene_matrix = gene_matrix,
gRNA_matrix = gRNA_matrix,
covariate_matrix = covariate_matrix,
gene_gRNA_pairs = gene_gRNA_pairs,
side = "both")
I also added some checks to the function to protect against the issues present in your data. Still, I think the analysis could be improved considerably. I'd suggest that you try the following:
gene_gRNA_pairs
, ensure that the gene IDs in gene_gRNA_pairs
are a subset of the row names of gene_matrix
, etc.)Thank you for being patient as we work to make the software easier to use and improve the documentation. Your feedback has been very helpful. Please let me know how your analysis goes.
Yeah actually I found out my gRNA-gene pair was missing the HLA genes and some others due to messing up my string splitting. Fixing that and removing duplicates (I accidentally just pulled it from Seurat object w/o collapsing) I end up with errors with my non-targeting guides. I removed them and it works, however most of my results are the targets itself being the most significant; I was wondering if your model takes into account the non-targeting guides to act as a control?
Thanks for the in-depth solution!
Best, Chang
Got it, thanks for the update.
The non-targeting controls (NTCs) are not used to compute the results for the targeting guides, no. We recommend using the NTCs to check that the method is correctly calibrated. So when you pair the NTCs to genes, the resulting p-values should be approximately uniformly distributed.
Do I understand correctly that there is still a problem? It sounds like you still are getting warning/error messages? If so, feel free to send your corrected data to me, and I can take another look.
Well it's not really an issue, it just all the results are the guide-gene pair that is getting significant results. I have tried to group the gRNAs that target the same gene but roughly same results. However I think this maybe just the nature of my dataset.
Thanks for the clarification, Chang
I see. Thank you for the update. I don't mean to badger you about this, but I do have a few additional questions, if you don't mind.
Thanks.
Hi Chang, I am going to close this issue, but I want to note that we updated the vignette (again), which might be helpful in answering some of your questions.
Hello,
I'm trying to run it on my dataset however it immediately fails:
Error in { : task 1 failed - "incorrect number of subscripts"
To double check: my count matrix is the count matrix in my seurat object and the perturb matrix is a binary matrix of the individual sgRNAs with the same colnames with the cells.Best, Chang