Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.
https://katsevich-lab.github.io/sceptre/
GNU General Public License v3.0
26 stars 8 forks source link

Error in task1 #10

Closed cnk113 closed 2 years ago

cnk113 commented 2 years ago

Hello,

I'm trying to run it on my dataset however it immediately fails: Error in { : task 1 failed - "incorrect number of subscripts" To double check: my count matrix is the count matrix in my seurat object and the perturb matrix is a binary matrix of the individual sgRNAs with the same colnames with the cells.

Best, Chang

cnk113 commented 2 years ago

Actually I was wondering what to do with the non targeting guides? I think it may be the issue since its in the gene pairs and perturbation matrix, but it's not in the expression matrix?

timothy-barry commented 2 years ago

Hello,

Which tutorial are you looking at? The "Running sceptre on moderately sized data" tutorial?

cnk113 commented 2 years ago

Yes that is the tutorial I'm following.

timothy-barry commented 2 years ago

OK, thanks. I am trying to understand your question a bit better. The functionrun_sceptre_in_memory takes two matrices as arguments: expression_matrix and perturbation_matrix. expression_matrix is a gene-by-cell expression matrix, and perturbation_matrix is a perturbation-by-cell matrix. By default, the function computes p-values for all perturbation-gene pairs, including the negative control perturbations.

So, to answer your question, you just treat the negative control perturbations as you would any other perturbation when you call the function. We recommend using negative control/gene pairs to assess calibration of the method. Does this answer your question?

cnk113 commented 2 years ago

I see, that makes sense. However I still am running into the initial error.

timothy-barry commented 2 years ago

Could you please provide more information about the matrices that you are passing to the function? For example, could you call str and class on each matrix and print the results? It is hard for me to provide a solution without more information.

cnk113 commented 2 years ago
class(expr_matrix)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"
> class(perturb_matrix)
[1] "matrix" "array" 
> class(covariate_matrix)
[1] "data.frame"
> class(gene_pair)
[1] "data.frame"
str(expr_matrix)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:92466284] 11 22 34 35 48 49 56 57 59 65 ...
  ..@ p       : int [1:15285] 0 6774 13562 20119 27497 34539 41853 48364 55231 61690 ...
  ..@ Dim     : int [1:2] 29506 15284
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:29506] "MIR1302-2HG" "FAM138A" "ENSG00000238009" "ENSG00000239945" ...
  .. ..$ : chr [1:15284] "AAACCCAAGGGATCGT_1" "AAACCCAAGTCAGGGT_1" "AAACCCACAATCAGCT_1" "AAACCCAGTACCCGCA_1" ...
  ..@ x       : num [1:92466284] 1 1 1 1 1 1 6 1 5 1 ...
  ..@ factors : list()
> str(perturb_matrix)
 num [1:787, 1:15284] 1 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:787] "C5orf46-3" "PXMP4-2" "COL18A1-3" "CCBE1-3" ...
  ..$ : chr [1:15284] "AAACCCAAGGGATCGT_1" "AAACCCAAGTCAGGGT_1" "AAACCCACAATCAGCT_1" "AAACCCAGTACCCGCA_1" ...
> str(covariate_matrix)
'data.frame':   15284 obs. of  2 variables:
 $ opl33.nCount_RNA: num  70513 52782 85677 37766 43078 ...
 $ batch           : Factor w/ 2 levels "SL268","SL269": 1 1 1 1 1 1 1 1 1 1 ...
> str(gene_pair)
'data.frame':   15284 obs. of  2 variables:
 $ gene_id: chr  "C5orf46" "PXMP4" "COL18A1" "CCBE1" ...
 $ gRNA_id: chr  "C5orf46-3" "PXMP4-2" "COL18A1-3" "CCBE1-3" ...
timothy-barry commented 2 years ago

Thank you. I believe that I have identified and fixed the bug. Could you please re-download package and try again?

As a side note, if you are including the gene or gRNA library size in the covariate matrix (which we recommend), it's best to log-transform these library sizes.

cnk113 commented 2 years ago

I updated sceptre but I'm still getting the same error.

timothy-barry commented 2 years ago

Would you be able somehow to share the data (privately) so that I can take a closer look?

cnk113 commented 2 years ago

Yeah I can share it on Google drive or Microsoft onedrive. What's your email?

Best, Chang

timothy-barry commented 2 years ago

Thank you! I think that we will be able to get this sorted out.

Email: tbarry2@andrew.cmu.edu

timothy-barry commented 2 years ago

I received your email, thanks. I will get back to you in a few days.

timothy-barry commented 2 years ago

Hi Chang,

We just released a major update to the software. I'd encourage you to take a look at the new (and much simpler) tutorial and have another go.

That being said, I identified a few issues with the data that you sent me:

  1. Some of the genes in your gene_gRNA_pairs are not part of your gene_matrix.
  2. Some of the entries in your gene_gRNA_pairs are duplicated.
  3. Some of your genes and gRNAs have very low (or zero) expression, making them difficult to analyze.
  4. Your covariate matrix is missing columns for log-transformed gene library size and percent mitochondrial reads.

I tried my best to fix these issues and analyze your data. Please see the below script:

library(Matrix)
library(dplyr)
library(magrittr)
library(sceptre)

ex <- readRDS("/Users/timbarry/research_offsite/external/private_screens/sceptre.rds")
gene_matrix <- ex[[1]]
gRNA_matrix <- ex[[2]]
covariate_matrix <- ex[[3]] %>% dplyr::rename("lg_gRNA_ct" = "opl33.nCount_RNA") %>% 
  mutate(lg_gene_ct = log(colSums(gene_matrix)))
gene_gRNA_pairs <- ex[[4]] %>% dplyr::filter(gene_id %in% row.names(gene_matrix)) %>%
  distinct()

result <- run_sceptre_high_moi(gene_matrix = gene_matrix,
                               gRNA_matrix = gRNA_matrix,
                               covariate_matrix = covariate_matrix,
                               gene_gRNA_pairs = gene_gRNA_pairs,
                               side = "both")

I also added some checks to the function to protect against the issues present in your data. Still, I think the analysis could be improved considerably. I'd suggest that you try the following:

  1. Download the new version of sceptre
  2. Take a look at the new sceptre tutorial
  3. Check your data to ensure that everything makes sense (e.g., remove duplicate rows in gene_gRNA_pairs, ensure that the gene IDs in gene_gRNA_pairs are a subset of the row names of gene_matrix, etc.)
  4. Include log-transformed gene library size and percent mitochondrial reads in your covariate matrix (step 1 of tutorial)
  5. Group together gRNAs that target the same site (step 2 of tutorial)
  6. Determine the appropriate sidedness of the test (step 4 of tutorial)

Thank you for being patient as we work to make the software easier to use and improve the documentation. Your feedback has been very helpful. Please let me know how your analysis goes.

cnk113 commented 2 years ago

Yeah actually I found out my gRNA-gene pair was missing the HLA genes and some others due to messing up my string splitting. Fixing that and removing duplicates (I accidentally just pulled it from Seurat object w/o collapsing) I end up with errors with my non-targeting guides. I removed them and it works, however most of my results are the targets itself being the most significant; I was wondering if your model takes into account the non-targeting guides to act as a control?

Thanks for the in-depth solution!

Best, Chang

timothy-barry commented 2 years ago

Got it, thanks for the update.

The non-targeting controls (NTCs) are not used to compute the results for the targeting guides, no. We recommend using the NTCs to check that the method is correctly calibrated. So when you pair the NTCs to genes, the resulting p-values should be approximately uniformly distributed.

Do I understand correctly that there is still a problem? It sounds like you still are getting warning/error messages? If so, feel free to send your corrected data to me, and I can take another look.

cnk113 commented 2 years ago

Well it's not really an issue, it just all the results are the guide-gene pair that is getting significant results. I have tried to group the gRNAs that target the same gene but roughly same results. However I think this maybe just the nature of my dataset.

Thanks for the clarification, Chang

timothy-barry commented 2 years ago

I see. Thank you for the update. I don't mean to badger you about this, but I do have a few additional questions, if you don't mind.

  1. You are working with high MOI data, right?
  2. What covariates are included in your covariate matrix?
  3. Do the negative control - gene pairs also produce significant results?

Thanks.

timothy-barry commented 2 years ago

Hi Chang, I am going to close this issue, but I want to note that we updated the vignette (again), which might be helpful in answering some of your questions.