Closed OmonkeyGOD closed 3 years ago
Hi,
Thanks for highlighting that quote:
The consistency index for each site in the alignment calculated by dividing the minimum number of changes on the phylogeny by the number of different nucleotides observed at that site minus one
I need to double check this but I think I have made a mistake here - it is wrong way round or very badly worded. In the code for homoplasyFinder
and in the definitions online the consistency index is calculated as:
consistencyIndex <- minNumberOfPossibleChanges / numberChangesObservedOnPhylogeny
where minNumberOfPossibleChanges
is the number of nucleotides observed at a site minus 1.
For the results from the example data:
Position ConsistencyIndex CountsACGT MinimumNumberChangesOnTree
1 57 0.5 0:0:139:16 2
2 179 0.5 0:151:0:4 2
3 207 0.5 5:0:150:0 2
4 241 0.5 6:149:0:0 2
5 339 0.5 0:0:4:151 2
6 534 0.5 152:0:0:3 2
7 559 0.5 0:0:2:153 2
8 689 0.5 16:139:0:0 2
9 696 0.5 5:150:0:0 2
10 771 0.5 0:6:0:149 2
the minimum number of changes observed on the phylogeny for 10 sites is equal to 2. For each of these sites there are multiple alleles present, for example at position 57
139 of the sequences have a G
and 16 have a T
so the minNumberOfPossibleChanges
is 2 - 1
. The numberOfChangesObservedOnPhylogeny
is equal to the MinimumNumberChangesOnTree (2). Therefore the consistency index = 1/2
= 0.5
.
In the article the tree length for a given site can be considered the numberOfChangesObservedOnPhylogeny
- it is the minimum number of changes needed to explain the nucleotides present at the site given the structure of the phylogeny.
Thanks a lot.
Hi,
I have some questions about the CI calculation. From your paper, you mentioned that > The consistency index for each site in the alignment is then calculated by dividing the minimum number of changes on the phylogeny by the number of different nucleotides observed at that site minus one.
In the example of the program, all the ten homoplasy sites have the MinimumNumberChangesOnTree of 2. Could you explain a little more about this? How do you interpret the minimum number of changes on a tree? All the sites have a CI of 0.5. Does this mean the different nucleotides observed for them are all 5?
In the paper, you also give a diagram demonstrating calculating the tree length of one site in a nucleotide alignment. If you were to calculate the CI of the site showing up in the tree, what is the minimum number of changes, and what is the number of different nucleotides observed? Is the minimum number of changes related to the tree length? Thanks in advance.