emmanuelparadis / ape

Analysis of Phylogenetics and Evolution
https://emmanuelparadis.github.io/
GNU General Public License v2.0
53 stars 11 forks source link

Glitch in dist.dna #22

Closed Anthogonyst closed 3 years ago

Anthogonyst commented 3 years ago

Hello and thank you for your development on ape. One thing I've noticed is that when there are too many sequences being used for dist.dna(fastas) that the N and K80 models can't seem to produce the correct output.

Usually the numbers go from expected values between 0-30, but when overloaded range from 0-4, and at the worst case scenario all zero (with maybe one column of 1's). The problem first manifests with 158 of my sequences (all +/- 5% of length 29800) at a total of around 4.5 million nucleotides long.

I hope that this isn't too problematic. Thank you.

emmanuelparadis commented 3 years ago

Hi. The maximum sequence length allowed is 2.1 billion nucleotides (and the same limit applies to the number of sequences), so you should be OK. Maybe your alignment has alignment gaps: their distribution may affect (very substantially) the distances. See the functions checkAlignment and image.DNAbin.