Open BioTurboNick opened 3 years ago
I haven't worked much with the AlignedSequence
type, but it seems like there could be an AbstractAlignedSequence
and a MultiAlignedSequences <: AbstractAlignedSequence
. This might be another place where explicitly defining and documenting the expected API a la https://github.com/BioJulia/BioSequences.jl/issues/140 would be useful.
I wonder if an MSA could be represented by a vector or Tuple of AlignedSequence
though.
Good ideas. It could be. I'm wondering though about the strong assumption in AlignedSequence that a sequence is aligned to a single known reference. That makes a lot of sense for aligning sequencing reads to a reference genome. Not as much if you're aligning orthologs.
Maybe AlignedSequence could just be extended to have a single-sequence constructor that just assumes a reference exists that matches in all locations and gaps are all deletions against it.
Well, something has to be the reference, right? It could just be a consensus sequence that's never directly observed, but short of actually storing every sequence, you need something that edits are defined against.
Thinking about it some more, I wonder if you could do something like
consensus!(msa)
(or something) that updates the reference to the best consensus and re-calculates the edits against that.
Need capability to manipulate a multiple sequence alignment, seems like the right place to put it.
I started working on this but it may need more thought about how it will play nice with the pairwise alignment-oriented code.