Bioconductor / pwalign

Perform pairwise sequence alignments
1 stars 0 forks source link

alignedSubject/Pattern takes a long time to run #5

Open LTLA opened 6 years ago

LTLA commented 6 years ago

It seems that alignedSubject and alignedPattern take an unexpectedly long time to run:

library(Biostrings)
system.time(aln <- pairwiseAlignment(subject=DNAString(c("AAACGATCAGCTACGAACACT")), 
      DNAStringSet(rep("AACGAGGGCCACCTAGGAAGAAT", 1000))))
##   user  system elapsed 
##  0.208   0.008   0.219 
system.time(X <- alignedPattern(aln))
##   user  system elapsed 
## 16.622   0.008  16.783 
system.time(Y <- alignedSubject(aln))
##   user  system elapsed 
## 15.862   0.008  16.011 

Almost 100 times slower than the alignment itself, which I would have expected to be the most computationally intensive part of the process! This is a shame as we've been using the full alignment strings for large-scale processing of Nanopore data. I assume that the slowness is because the addition of -s to the end of the aligned sequence is done in a lapply loop in get_aligned_pattern, rather than in C.

R version 3.5.0 Patched (2018-04-30 r74679)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRblas.so
LAPACK: /home/cri.camres.org/lun01/Software/R/R-3-5-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Biostrings_2.49.0   XVector_0.21.1      IRanges_2.15.13    
[4] S4Vectors_0.19.11   BiocGenerics_0.27.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.27.0 compiler_3.5.0