crisprVerse / crisprDesign

Comprehensive design of CRISPR gRNAs for nucleases and base editors
MIT License
16 stars 5 forks source link

`addSpacerAlignments` fails to find off-targets with bowtie and custom genomes #20

Closed m-jahn closed 1 year ago

m-jahn commented 1 year ago

Thanks for maintaining this suite of packages. I just ran into a series of problems when using addSpacerAlignments with bowtie instead of biostrings as aligner. Basically I'm working with bacterial genomes that are well known and annotated in NCBI but often have no BSgenome package available. I therefore need to build those custom-wise. This works only until the point I'm trying to annotate off targets with addSpacerAlignments. The following example illustrates this:

library(tidyverse)
library(Biostrings)
library(GenomicRanges)
library(GenomicFeatures)
library(crisprBase)
library(crisprDesign)

# import genome annotation from GFF
txdb <- makeTxDbFromGFF("results/get_genome/genome.gff")

# import sequence from FASTA
genome_dna <- readDNAStringSet("results/get_genome/genome.fasta")

# find spacers
data(list = c("SpCas9"), package = "crisprBase")
list_pred_guides <- findSpacers(
  x = genome_dna,
  spacer_len = 20,
  crisprNuclease = SpCas9
)

# add off targets using bowtie
library(BSgenomeSalmonellaenterica)
list_pred_guides <- addSpacerAlignments(
  list_pred_guides[1:1000],
  aligner = "bowtie",
  aligner_index = "results/bowtie_index/index",
  bsgenome = BSgenomeSalmonellaenterica,
  addSummary = TRUE,
  n_mismatches = 3
)

The output I get indicates that reads were aligned but they don't show up in the guide set:

Loading required namespace: crisprBwa
[runCrisprBowtie] Using BSgenomeSalmonellaenterica 
[runCrisprBowtie] Searching for SpCas9 protospacers 
# reads processed: 1000
# reads with at least one alignment: 1000 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 2408 alignments

> # but no alignments were added to guide set
> all(list_pred_guides$n0 == 0)
[1] TRUE

I then tried the same using the txdb object as input. Fails with either a time out error or a biomart error related to the genome being non-standard.

addSpacerAlignments(
  list_pred_guides[1:1000],
  aligner = "bowtie",
  aligner_index = "results/bowtie_index/index",
  bsgenome = BSgenomeSalmonellaenterica,
  addSummary = TRUE,
  n_mismatches = 3,
  txObject = txdb
)

Error in .getBiomartData(txdb, organism) :                                                                         
  Organism "NA" not recognized in biomaRt. You can use",
                "organism=NULL as a solution.

Using the conversion function TxDb2GRangesList(txdb) as a fallback throws the same error. I attached files with an example genome, the BSgenome library and the bowtie index --> input.zip

Any ideas for this problem are highly appreciated.

Thanks for your help!

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/michael/micromamba/envs/snakemake-crispr-guides/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BSgenomeSalmonellaenterica_1.0.0 BSgenome_1.66.3                 
 [3] rtracklayer_1.58.0               crisprDesign_1.0.0              
 [5] crisprBase_1.2.0                 GenomicFeatures_1.50.4          
 [7] AnnotationDbi_1.60.2             Biobase_2.58.0                  
 [9] GenomicRanges_1.50.2             Biostrings_2.66.0               
[11] GenomeInfoDb_1.34.9              XVector_0.38.0                  
[13] IRanges_2.32.0                   S4Vectors_0.36.2                
[15] BiocGenerics_0.44.0              lubridate_1.9.2                 
[17] forcats_1.0.0                    stringr_1.5.0                   
[19] dplyr_1.1.1                      purrr_1.0.1                     
[21] readr_2.1.4                      tidyr_1.3.0                     
[23] tibble_3.2.1                     ggplot2_3.4.2                   
[25] tidyverse_2.0.0                 

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                  matrixStats_0.63.0           
 [3] bit64_4.0.5                   filelock_1.0.2               
 [5] progress_1.2.2                httr_1.4.5                   
 [7] tools_4.2.2                   utf8_1.2.3                   
 [9] R6_2.5.1                      DBI_1.1.3                    
[11] colorspace_2.1-0              withr_2.5.0                  
[13] tidyselect_1.2.0              prettyunits_1.1.1            
[15] bit_4.0.5                     curl_4.3.3                   
[17] compiler_4.2.2                crisprBowtie_1.2.0           
[19] cli_3.6.1                     basilisk.utils_1.10.0        
[21] crisprScoreData_1.2.0         xml2_1.3.3                   
[23] DelayedArray_0.24.0           scales_1.2.1                 
[25] randomForest_4.7-1.1          rappdirs_0.3.3               
[27] digest_0.6.31                 Rsamtools_2.14.0             
[29] crisprScore_1.2.0             basilisk_1.10.2              
[31] htmltools_0.5.5               pkgconfig_2.0.3              
[33] MatrixGenerics_1.10.0         dbplyr_2.3.2                 
[35] fastmap_1.1.1                 rlang_1.1.0                  
[37] RSQLite_2.3.1                 shiny_1.7.4                  
[39] BiocIO_1.8.0                  generics_0.1.3               
[41] jsonlite_1.8.4                vroom_1.6.1                  
[43] BiocParallel_1.32.6           VariantAnnotation_1.44.1     
[45] RCurl_1.98-1.12               magrittr_2.0.3               
[47] GenomeInfoDbData_1.2.9        Matrix_1.5-4                 
[49] Rcpp_1.0.10                   munsell_0.5.0                
[51] fansi_1.0.4                   reticulate_1.28              
[53] Rbowtie_1.38.0                lifecycle_1.0.3              
[55] stringi_1.7.12                yaml_2.3.7                   
[57] SummarizedExperiment_1.28.0   zlibbioc_1.44.0              
[59] BiocFileCache_2.6.1           AnnotationHub_3.6.0          
[61] grid_4.2.2                    blob_1.2.4                   
[63] promises_1.2.0.1              parallel_4.2.2               
[65] ExperimentHub_2.6.0           crayon_1.5.2                 
[67] crisprBwa_1.2.0               dir.expiry_1.6.0             
[69] lattice_0.21-8                hms_1.1.3                    
[71] KEGGREST_1.38.0               pillar_1.9.0                 
[73] rjson_0.2.21                  codetools_0.2-19             
[75] biomaRt_2.54.1                XML_3.99-0.14                
[77] glue_1.6.2                    BiocVersion_3.16.0           
[79] BiocManager_1.30.20           httpuv_1.6.9                 
[81] png_0.1-8                     vctrs_0.6.1                  
[83] tzdb_0.3.0                    gtable_0.3.3                 
[85] cachem_1.0.7                  mime_0.12                    
[87] Rbwa_1.2.0                    xtable_1.8-4                 
[89] restfulr_0.0.15               later_1.3.0                  
[91] GenomicAlignments_1.34.1      memoise_2.0.1                
[93] timechange_0.2.0              ellipsis_0.3.2               
[95] interactiveDisplayBase_1.36.0
m-jahn commented 1 year ago

Hi, is there probably any update on this issue?

Jfortin1 commented 1 year ago

Hi @m-jahn, thank you for reporting this and sharing your code/data -- going to take a look at it today and get back to you.

Jfortin1 commented 1 year ago

@m-jahn Re. the first issue (n0=0 after calling addSpacerAlignments), this was resolved in a previous fix in BioC devel, so I'd suggest to use the devel version of crisprDesign.

For the second issue (error with txdb object), the problem comes from GenomicFeatures:::fiveUTRsByTranscript not working properly with this custom GFF ; we'll work on a fix for that.

m-jahn commented 1 year ago

Thanks for your update! I will try with BioC devel version, and with this other issue, wait or take a closer look too. Will get back with further info.

m-jahn commented 1 year ago

Just a follow-up: the latest version of crisprDesign actually annotates the off targets correctly. The second issue is probably something for GenomicFeatures, so this issue can be closed from my side.