McGill-CSB / PHYLO

a gaming framework to align genomic data
phylo.cs.mcgill.ca/edge
Other
11 stars 14 forks source link

Some sequences not counted? #111

Closed Strontium76 closed 8 years ago

Strontium76 commented 9 years ago

phylo bug 01

Here are 3 images of the same game with few differences. I each time selected the lower family to see how the 2 last lines were counted. I don't understand how it works! First image, everything seems normal. (total score = 80) On the second, I shifted the last square to the right. Why do the last blocks of each sequence is forgoten in the family? Furthermore, there is now a gap, but the total score is 81, higher than before! (explanation: The 2 purple blocks aren't counted, so the gap in the top most sequence isn't counted anymore) Third image: if I shift the top sequences (that should have no impact on the family I'm working on), the farther purple block get counted now and adds a gap! (total score 77). The block is counted because there are the 2 blue blocks above.

Is it a bug or some rule I didn't see explanation on?

waldispuhl commented 9 years ago

It is indeed something not very intuitive. This not a bug but a consequence of how the ancestors stored in the tree nodes are calculated. Due to the lack of space, we couldn't describe the method to calculate ancestors in the tutorial in details. But you can find a better description at http://phylo.cs.mcgill.ca/archive/2009/eng/faq.html#scoring Unfortunately, a method to calculate the best ancestors would be too (computationally) expensive. Instead, we rely on a method directly inspired from the Fitch algorithm (See http://www.cs.ubc.ca/labs/beta/Courses/CPSC536A-01/Class10/class10-notes.html for example). It works well in the vast majority of cases but can indeed not have the desired behavior in rare cases. In your case, in the 3rd column from the right in the 3rd screenshot, the ancestor of human & chimp is a blue tile while the ancestor of the cow & horse is a violet tile or a gap. Then, the ancestor of the 4 sequences (the root of the tree) is a blue or violet tile, or a gap because there is no solution shared by the two previous ancestors (human & chimp and cow & horse). But the gap have the lowest priority, thus we choose a blue or violet tile (violet in that case but blue would be equivalent). Now if you look at the 4th column from the right, you see with the same logic that the ancestor of human & chimp is a green tile or a gap, and the ancestor of cow & horse a violet tile or a gap. The ancestor of all species will be this time the intersection of the two possibilities: a gap. Please, let us know if it answers you question or need more explanations.