Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

pairwiseAlignment: documentation question #81

Closed gevro closed 2 years ago

gevro commented 2 years ago

Hi, The documentation for pairwiseAlignment says:

pattern: a character vector of any length, an XString, or an XStringSet object.
subject: a character vector of length 1, an XString, or an XStringSet object of length 1.

However, the function seems to work even when pattern = character vector of length N, and subject = XStringSet object of length N, where N > 1.

This doesn't match the documentation, yet it seems to be aligning pattern[1] to subject[1], pattern[2] to subject[2], ...

See here for example:

> class(blah1)
[1] "character"
> length(blah1)
[1] 2
> class(blah2)
[1] "DNAStringSet"
attr(,"package")
[1] "Biostrings"
> length(blah2)
[1] 2
> blah3 <- pairwiseAlignment(pattern=blah1,subject=blah2)
> class(blah3)
[1] "PairwiseAlignments"
attr(,"package")
[1] "Biostrings"
> length(blah3)
[1] 2

So just making sure, is matching of pattern[1] to subject[1], pattern[2] to subject[2], pattern[3] to subject[3] ... , the expected behavior when subject is an XStringSet of length > 1?

If so, this would be worth adding to the documentation, and also specifying that pattern and subject must be the same length.

Thanks.

hpages commented 2 years ago

Corrected in Biostrings 2.65.5:

 pattern: a character vector or ‘XStringSet’ derivative of any length,
          or an ‘XString’ derivative.

 subject: a character vector or ‘XStringSet’ derivative of length 1 or
          ‘length(pattern)’, or an ‘XString’ derivative.

H.

gevro commented 2 years ago

Thanks. And just making sure, is this what it is doing? => Matching pattern[1] to subject[1], pattern[2] to subject[2], pattern[3] to subject[3] ...

hpages commented 2 years ago

Just clarified that too:

Value:

     If ‘scoreOnly == FALSE’ (the default), the function returns a
     ‘PairwiseAlignmentsSingleSubject’ object (if a single subject was
     supplied) or a ‘PairwiseAlignments’ object (if more than one
     subject was supplied). In both cases, the returned object contains
     N _optimal pairwise alignments_ where N is the number of supplied
     patterns, that is, N = ‘length(pattern)’ if ‘pattern’ is a
     character vector or ‘XStringSet’ derivative, or N = 1 if it's an
     ‘XString’ derivative. If more than one subject was supplied, the
     alignments in the returned ‘PairwiseAlignments’ object are
     obtained by aligning ‘pattern[[1]]’ to ‘subject[[1]]’,
     ‘pattern[[2]]’ to ‘subject[[2]]’, ‘pattern[[3]]’ to
     ‘subject[[3]]’, etc...

     ...
gevro commented 2 years ago

Thanks so much for confirming.