Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
54 stars 16 forks source link

fix: Issue #101 #111

Open ahl27 opened 3 months ago

ahl27 commented 3 months ago

fixes bug with 0-length input to consensusMatrix and consensusString mentioned in #101.

Change was only made to consensusMatrix, returning a matrix with 0 rows causes consensusString to correctly handle the input (returning character(0) in all cases)

Examples:

> # Verifying old behavior
> consensusMatrix(DNAStringSet("ATGC"), baseOnly=TRUE)
      [,1] [,2] [,3] [,4]
A        1    0    0    0
C        0    0    0    1
G        0    0    1    0
T        0    1    0    0
other    0    0    0    0
> consensusString(DNAStringSet(c("ATGC", "ATTC")))
[1] "ATKC"

> # New behavior
> consensusMatrix(DNAStringSet())
     A C G T M R W S Y K V H D B N - + .
> consensusMatrix(DNAStringSet(), baseOnly=TRUE)
     A C G T
> consensusMatrix(RNAStringSet())
     A C G U M R W S Y K V H D B N - + .
> consensusMatrix(AAStringSet())
     A R N D C Q E G H I L K M F P S T W Y V U O B J Z X * - + .
> consensusMatrix(BStringSet())
<0 x 0 matrix>
> consensusMatrix(character(0L))
<0 x 0 matrix>

> consensusString(DNAStringSet())
character(0)
> consensusString(RNAStringSet())
character(0)
> consensusString(AAStringSet())
character(0)
> consensusString(BStringSet())
character(0)
> consensusString(character(0L))
character(0)

> is.integer(consensusMatrix(DNAStringSet('ATGC'), as.prob=FALSE))
[1] TRUE
> is.integer(consensusMatrix(DNAStringSet('ATGC'), as.prob=TRUE))
[1] FALSE
> is.integer(consensusMatrix(DNAStringSet(), as.prob=FALSE))
[1] TRUE
> is.integer(consensusMatrix(DNAStringSet(), as.prob=TRUE))
[1] FALSE
hpages commented 3 months ago

Nice set of tests above. Should be easy to turn it into formal unit tests. Thanks!