Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

Bug: length of PairwiseAlignments #78

Closed gevro closed 2 years ago

gevro commented 2 years ago

Hi, Per the documentation, length should return the lengths of each alignment of a PairwiseAlignments object

General information methods
In the code snippets below, x is a PairwiseAlignments object, except otherwise noted.

alphabet(x): Equivalent to alphabet(unaligned(subject(x))).

length(x): The common length of alignedPattern(x) and alignedSubject(x). There is a method for PairwiseAlignmentsSingleSubjectSummary as well.

However, the latest version of Biostrings simply returns the total number of alignments within the PairwiseAlignments object.

I believe this is a bug. Also, there does not seem to be a function to do what the documentation indicates.

hpages commented 2 years ago

Hi,

However, the latest version of Biostrings simply returns the total number of alignments within the PairwiseAlignments object.

This has always been the case and is the expected behavior of length() on a PairwiseAlignments object:

library(Biostrings)
pattern <- AAStringSet(c("PAWHEAE", "PAWHAE", "PAWWHEAE"))
subject <- AAString("HEAGAWGHEE")
x <- pairwiseAlignment(pattern, subject)
x
length(x)
# [1] 3

Also I don't see any contradiction with the documentation:

alignedPattern(x)
# AAStringSet object of length 3:
#     width seq
# [1]    10 P---AWHEAE
# [2]    10 P---AW-HAE
# [3]    11 P---AWWHEAE

length(alignedPattern(x))
# [1] 3

alignedSubject(x)
# AAStringSet object of length 3:
#     width seq
# [1]    10 HEAGAWGHEE
# [2]    10 HEAGAWGHEE
# [3]    11 HEAGAWGHE-E

length(alignedSubject(x))
# [1] 3

Use nchar() if you want the lengths of each alignment in the PairwiseAlignments object:

nchar(x)
# [1] 10 10 11
gevro commented 2 years ago

Ok thank you.