Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.
https://katsevich-lab.github.io/sceptre/
GNU General Public License v3.0
24 stars 7 forks source link

Error in .highmoi_run_gene_precomputation_at_scale_round_1(pod_id = pod_id, : task 1 failed - "ℹ In index: 1. #24

Closed redbybeing closed 1 year ago

redbybeing commented 1 year ago

Hi,

I am Jiseok Lee from Dr. Hyejung Won's lab in UNC. I watched your IGVF seminar about low MOI sceptre and it was very, very helpful. Thank you!

I have CROP-seq data and I tried following your high MOI tutorial to analyze my data. I used high MOI because the number of different guides expressed in a cell ranges from 1 to 80.

image

I get this error when I try to run run_sceptre_highmoi().

Running gene precomputations. Error in .highmoi_run_gene_precomputation_at_scale_round_1(pod_id = pod_id, : task 1 failed - "ℹ In index: 1. Caused by error in contrasts<-: ! contrasts can be applied only to factors with 2 or more levels"

How can I fix this error?

Thank you.

Best, Jiseok

ekatsevi commented 1 year ago

Dear Jiseok,

Thanks for attending the talk! I am very happy to hear you found it helpful. On the other hand, I'm sorry to hear that the high MOI SCEPTRE software is giving you trouble. We'll look into this and get back to you shortly.

Thank you, Gene

ekatsevi commented 1 year ago

The error may be caused by the fact that one of your covariates is a character or factor variable with just one unique value; see here for more.

redbybeing commented 1 year ago

Aha! It's running well after I removed covariates that have just one unique value. I had simply copied all the cell metadata to the covariate matrix. I guess I should leave only the metadata that will be used as covariates in the computation? Also, how long would it take to run ~85 guide-gene pairs? It's been running over an hour and still didn't finish. I'm wondering whether I should submit as a job to slurm.

timothy-barry commented 1 year ago

Hi, I would not expect the software to take more than ~20 minutes for 85 pairs. Do you have parallel set to TRUE?

(Please note that the low-MOI functionality currently is orders of magnitude more performant than the high-MOI functionality; we'll be making updates to improve the performance of the high-MOI functionality soon, likely within the next month.)

redbybeing commented 1 year ago

I didn't set parallele to TRUE but I guess it's TRUE by default? Based on the guide distribution histogram in the original post, should I use high-MOI?

timothy-barry commented 1 year ago

I might try setting parallel to FALSE; setting parallel to TRUE (the default) has caused issues on some machines.

What is the average number of gRNAs per cell? Also, are the gRNAs targeting genes or regulatory elements (or something else)?

redbybeing commented 1 year ago

Average number of unique guides per cell is ~10. Guides are targeting protein-coding genes, coding sequences (for CRISPR knockout). I will try parallel=FALSE.

timothy-barry commented 1 year ago

I see. And these gRNAs are targeting distinct genes?

And just to clarify: you are pairing each CRISPR perturbation to a single gene in your gene_gRNA_group_pairs data frame?

redbybeing commented 1 year ago

I have two guides targeting each gene, but I didn't combine them into one because I didn't want a non-working guide diluting the effect of the other guide (if such case happens). Right now I am just testing positive pairs. For example, I paired Fxr1 guide to Fxr1 gene. This is how my gene_gRNA_group_pairs looks like.

Screenshot 2023-05-22 at 1 06 40 PM
redbybeing commented 1 year ago

Oh, somehow I got this error now

Error in .highmoi_run_gene_precomputation_at_scale_round_1(pod_id = pod_id, : task 1 failed - "ℹ In index: 8. Caused by error in vglm.fitter(): ! vglm() only handles full-rank models (currently)"

timothy-barry commented 1 year ago

Could you please print your covariate matrix? Something seems to be off with that.

redbybeing commented 1 year ago

Not sure how to print all 3717 rows but here's the part of it.

                  batch    sex     p_mito lg_gene_lib_size lg_gRNA_lib_size

AAACCCACATGGCCCA-1_1 batch1 female 2.68795473 8.506739 2.9957323 AAACCCAGTCCAGTTA-1_1 batch1 female 1.44000000 9.328123 4.8828019 AAACGAAAGCGCCTCA-1_1 batch1 female 2.20054446 8.391176 3.2580965 AAACGAAAGGGAGAAT-1_1 batch1 female 1.57238734 8.559294 3.2958369 AAACGAATCCACAGCG-1_1 batch1 female 5.75485800 7.892078 1.7917595 AAACGAATCCTTGACC-1_1 batch1 female 2.12340744 8.294799 2.0794415 AAACGAATCGCTAATG-1_1 batch1 female 2.88054482 9.293394 3.4965076 AAACGCTAGCATAGGC-1_1 batch1 female 0.92526690 7.247793 2.3025851 AAACGCTCACTCATAG-1_1 batch1 female 2.17637724 9.490469 2.9444390 AAACGCTGTAGAGATT-1_1 batch1 female 1.15946718 9.827848 3.1780538 AAACGCTGTCGAGCAA-1_1 batch1 female 1.89221919 8.795734 4.5432948 AAACGCTGTTATGGTC-1_1 batch1 female 1.63748010 8.388678 2.6390573 AAAGGATAGACCATGG-1_1 batch1 female 2.97888386 9.269364 2.7080502 AAAGGATAGATCCCAT-1_1 batch1 female 1.95821231 9.275472 3.2580965 AAAGGATCAACCCTCT-1_1 batch1 female 1.78934526 8.500657 3.1780538 AAAGGATTCTGCCTCA-1_1 batch1 female 2.10675530 9.926129 3.9702919

timothy-barry commented 1 year ago

Thanks. Could you please cross-tabulate batch and sex? Ie, table(batch, sex).

redbybeing commented 1 year ago

Like this?

table(covariate_matrix$batch, covariate_matrix$sex)

     female male

batch1 2693 0 batch2 671 0 batch3 0 114 batch4 0 239

timothy-barry commented 1 year ago

Ah, so all females are sequenced within batches 1 and 2 and all males within batches 3 and 4. This is called a nested design; unfortunately, sceptre does not currently support nested designs.

Could you try removing sex and running again?

ekatsevi commented 1 year ago

Note that sex is still implicitly controlled for even if you remove this covariate, since this information is already contained in the batch variable.

ekatsevi commented 1 year ago

Based on the guide distribution histogram in the original post, should I use high-MOI?

Yes, you should be using high-MOI!

redbybeing commented 1 year ago

Now it ran well. Now it only took 1 min! I got the result table. It looks like sceptre automatically removes guides (or genes?) with low read counts, so now I have 56 rows in the result table instead of original 84 pairs. Is there a way to override this auto-removal? Or should I leave it as it is and add more batches/cells later?

Warning messages: 1: In run_sceptre_highmoi(gene_matrix = gene_matrix, combined_perturbation_matrix = perturbation_matrix, : Removing the following genes with low expressions (UMI count <250): guide-Anp32e-2 guide-Asphd1-2 guide-Eif1ad-2 guide-Kat5-1 guide-Maz-2 guide-Rbm26-2 guide-Rnaseh2c-2 guide-Smyd4-1 2: In run_sceptre_highmoi(gene_matrix = gene_matrix, combined_perturbation_matrix = perturbation_matrix, : Removing the following perturbations with low counts (counts <30): guide-Anp32e-2 guide-Asphd1-2 guide-Eif1ad-2 guide-Kat5-1 guide-Maz-2 guide-Rbm26-2 guide-Rnaseh2c-2 guide-Smyd4-1. Consider grouping together perturbations that target the same site to circumvent this problem.

ekatsevi commented 1 year ago

Now it ran well. Now it only took 1 min!

That's great to hear! Thanks for reaching out to us so we could help you troubleshoot.

It looks like sceptre automatically removes guides (or genes?) with low read counts, so now I have 56 rows in the result table instead of original 84 pairs. Is there a way to override this auto-removal? Or should I leave it as it is and add more batches/cells later?

Indeed, SCEPTRE has some built-in quality control checks. These ensure that the results you get are reliable, and not subject to statistical artifacts. There is not currently a way to override this, so I recommend you leave the results as is. In a future release of SCEPTRE, we may allow users to change the QC thresholds or decrease the default thresholds.

redbybeing commented 1 year ago

Thank you so much Tim and Gene! It was greatly helpful and I am happy to be using sceptre for my CROP-seq data.

timothy-barry commented 1 year ago

Let us know if any other issues arise.