BioJulia / BioAlignments.jl

Sequence alignment tools
MIT License
60 stars 24 forks source link

Add more SubstitutionMatrix operations #60

Closed timholy closed 2 years ago

timholy commented 2 years ago

SubstitutionMatrix is not a subtype of AbstractArray. This may or may not be intentional, but currently the number of supported operations is fairly minimal. This provides a few more conveniences for users who might want to manipulate them beyond mere lookup tables.

Types of changes

This PR implements the following changes: (Please tick any or all of the following that are applicable)

:clipboard: Additional detail

:ballot_box_with_check: Checklist

Most of the items in this checklist don't apply, since these are just "missing" Base methods that I'd expect from anything with Matrix in the name.

timholy commented 2 years ago

Alternatively, I'd be happy to make this a subtype of AbstractMatrix. I think it mostly comes down to whether you want to support indexing with integers or not. My guess is "not," and that's the reason these are not subtypes.

codecov[bot] commented 2 years ago

Codecov Report

Merging #60 (a0c70c3) into master (5a3354e) will increase coverage by 0.44%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #60      +/-   ##
==========================================
+ Coverage   87.64%   88.08%   +0.44%     
==========================================
  Files          16       16              
  Lines        1052     1133      +81     
==========================================
+ Hits          922      998      +76     
- Misses        130      135       +5     
Impacted Files Coverage Δ
src/submat.jl 98.44% <100.00%> (+0.37%) :arrow_up:
src/pairwise/algorithms/banded_needleman_wunsch.jl 75.00% <0.00%> (-1.36%) :arrow_down:
src/pairwise/algorithms/needleman_wunsch.jl 83.59% <0.00%> (-0.71%) :arrow_down:
src/anchors.jl 100.00% <0.00%> (ø)
src/pairwise/algorithms/edit_distance.jl 100.00% <0.00%> (ø)
src/pairwise/algorithms/hamming_distance.jl 100.00% <0.00%> (ø)
src/pairwise/alignment.jl 98.26% <0.00%> (+0.04%) :arrow_up:
src/pairwise/algorithms/common.jl 90.62% <0.00%> (+0.30%) :arrow_up:
src/models.jl 85.50% <0.00%> (+0.43%) :arrow_up:
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5a3354e...a0c70c3. Read the comment docs.

kescobo commented 2 years ago

I didn't write this, so I'm speculating, but I'm guessing the reason the wasn't implemented previously is that substitution matrices tend to be pretty static, and don't require much beyond being able to reference a particular value. Things like BLOSUM have been around for decades more or less unchanged.

That said, I can't think of any reason NOT to do this, other than the up-front effort that you've already expended. So I'm in favor of merging. Additional maintenance burden seems trivial.

timholy commented 2 years ago

Out of curiousity, do you happen to know which version is implemented? https://www.nature.com/articles/nbt0308-274

kescobo commented 2 years ago

We have all of them - I think they're just downloaded direct from NCBI: https://biojulia.net/BioAlignments.jl/dev/pairalign/#Substitution-matrix-types-1

timholy commented 2 years ago

Right, I just meant, does NCBI host the original or corrected versions? But now I see the "corrected" versions are called RBLOSUMxx, so I'm guessing they must be the original ones.

kescobo commented 2 years ago

Oh, I see - I misread that reference :flushed:. I think you're right

CiaranOMara commented 2 years ago

I think the hold up here is that nobody has been able to provide definitive clarity around the types used. Unless the original authors @bicycle1885 or @SabrinaJaye can comment, someone would need to go through the code and rationalise the already made decisions, though I think @timholy has done this and is most familiar with this aspect of the code. So @timholy, I am happy to take your preference, whether to subtype from AbstractMatrix or AbstractSubstitutionMatrix. I also agree with @kescobo that maintenance either way would seem trivial.