BioJulia / BioAlignments.jl

Sequence alignment tools
MIT License
60 stars 24 forks source link

Inconsistent when aligning distinct sequence types #54

Open jakobnissen opened 3 years ago

jakobnissen commented 3 years ago

So I was surprised to find you can align Strings to each other:

julia> x= pairalign(LocalAlignment(), "ACA", "AAA", model)
PairwiseAlignmentResult{Int64, String, String}:
  score: 6
  seq: 1 ACA 3
         | |
  ref: 1 AAA 3

It's kind of cool that all it needs is a sequence of elements that implements convert(T, x) to the right type. But when displaying the sequence, it does not recognize that DNA_A == 'A'.

julia> x= pairalign(LocalAlignment(), "ACA", dna"AAA", model)
PairwiseAlignmentResult{Int64, String, LongDNASeq}:
  score: 6
  seq: 1 ACA 3

  ref: 1 AAA 3

We should also think about how to handle alignments of distinct sequence types. For example, how do you align to RNA sequences to each other? There is no substitution model, though obviously the DNA models could work. But since Strings are allowed to be used, we have an inconsistency: pairalign(LocalAlignment(), "AUA", dna"AAA", model) errors, but pairalign(LocalAlignment(), rna"AUA", dna"AAA", model) works, simply because convert(DNA, 'U') is an error, whereas convert(DNA, RNA_U) isn't.

BioTurboNick commented 3 years ago

Aligning RNA is a good question. DNA model is probably okay for mRNA, but I wonder if it would work as well for structured RNAs?