Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

trimLRPatterns with indels not trimming as intended in case of net insertions or deletions between subject and pattern? #88

Open jan-glx opened 1 year ago

jan-glx commented 1 year ago

In the following examples I'd expect trimLRPatterns to always leave ATCG:

library(Biostrings)
subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     3 TCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = TRUE, max.Lmismatch = 0.3 )
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     5 CATCG

Created on 2022-11-18 by the reprex package (v2.0.1)

acvill commented 1 year ago

I think the issue is that Biostrings does not consider terminal gaps in alignments to be indels. See https://github.com/Bioconductor/pwalign/issues/2.

ATCG is retained if you specify with.Lindels = FALSE.

subject <- DNAStringSet(c("AAATCTCTCATCG"))
trimLRPatterns(subject = subject, Lpattern = "AAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AAAATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     4 ATCG
trimLRPatterns(subject = subject, Lpattern = "AATCTCTC", Rpattern = "" , with.Lindels = FALSE, max.Lmismatch = 0.3)
#> DNAStringSet object of length 1:
#>     width seq
#> [1]     6 TCATCG
jan-glx commented 1 year ago

Possibly. But while that issue might be considered odd behavior, this here is a bug, right? Though by I am even more confused by your last with.Lindels = FALSE example right now.