Open matchy233 opened 3 years ago
I'm not sure if *
will every be directly supported internally. It will always have to be mapped to some character that fits within the scoring matrix, so SIMD lookups can be done. Right now, the amino acid matrix supports alphabetical characters A-Z
.
There are a couple of ways this could be solved:
J
could be used to represent *
, like what you said. On the Rust side, the scores in the amino acid matrix can be cloned and changed, but this is not yet exposed in the C API. Without changing the scores, matches and mismatches with J
incur a score of -128.*
can be translated to X
.0-20
, then allow block aligner to align numerical strings.
I'm using the C API of
block-aligner
to align protein sequences from UniProt database. There are*
s in some protein sequences. Currently usingblock-aligner
to align sequences containing*
will cause a Segmentation Fault. Although the users can resolve it by mapping*
to other supportedchar
s, it would be nice if we can support*
internally! :)