amnh / PCG

𝙋𝙝𝙮𝙡𝙤𝙜𝙚𝙣𝙚𝙩𝙞𝙘 𝘾𝙤𝙢𝙥𝙤𝙣𝙚𝙣𝙩 𝙂𝙧𝙖𝙥𝙝 ⸺ Haskell program and libraries for general phylogenetic graph search
28 stars 1 forks source link

Sankoff characters throw runtime exception when given the L1 norm metric #80

Closed recursion-ninja closed 5 years ago

recursion-ninja commented 5 years ago

After adding the protein data for sankoff characters to the integration test suite, the L1 norm metric fails, throwing an exception. This is likely because we have special case logic (for efficiency purposes) to use the Haskell function \(i,j) -> max i j - min i j when encountering the L1 norm as the metric rather than use the memoized TCM. We also do this special case function for the discrete metric, but that works as expected.

We need to track down why the L1 norm (additive) metric doesn't work as expected for Sankoff characters. This may be related to #78.

recursion-ninja commented 5 years ago

After updating the READ command's grammar to be more expressive, I resolved this issue. The input amino acid sequences of fasta files were being interpreted as large symbols in a custom alphabet.

Example:

The input of: CATGAT was being interpreted as:

["CATGAT"]

instead of

[["C"], ["A"], ["T"], ["G"], ["A"], ["T"]]