Implement substitution matrices for alignment

TimothyStiles commented 1 year ago

Alignment algorithms compare two sequences and produce a distance metric, "score", based on how similar they are.

This score usually accounts for characters that match, characters that don't, and any gaps between matching characters.

In the case of nucleotides matching and not matching can be a simple +1, -1 for anything that matches or mismatches respectively but in the case of protein sequences these weights can (and probably should) vary depending on the chemical similarities between groups of amino acids.

I know of the blosum substitution matrices and know that biogo has a few along with NeedlemanWunsch and SmithWaterman implementations. My thought would be to cite and use at least the matrices they provide and maybe implement our own alignment algos that would be easier to maintain longterm.

rkrishnasanka commented 1 year ago

Citing and reusing would be the best way to go about it.

TimothyStiles commented 1 year ago

link to biogo's matrices that should be cited: https://github.com/biogo/biogo/blob/master/align/matrix

bebop / poly

Implement substitution matrices for alignment #290