Closed recursion-ninja closed 5 years ago
After updating the READ
command's grammar to be more expressive, I resolved this issue. The input amino acid sequences of fasta files were being interpreted as large symbols in a custom alphabet.
Example:
The input of: CATGAT
was being interpreted as:
["CATGAT"]
instead of
[["C"], ["A"], ["T"], ["G"], ["A"], ["T"]]
After adding the protein data for sankoff characters to the integration test suite, the L1 norm metric fails, throwing an exception. This is likely because we have special case logic (for efficiency purposes) to use the Haskell function
\(i,j) -> max i j - min i j
when encountering the L1 norm as the metric rather than use the memoized TCM. We also do this special case function for the discrete metric, but that works as expected.We need to track down why the L1 norm (additive) metric doesn't work as expected for Sankoff characters. This may be related to #78.