JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

How to use homoplasyFinder? #9

Closed BiodeB closed 4 years ago

BiodeB commented 4 years ago

Hello Sir,

This is might be a nonsense question, I don't know R so I'm using jar file HomoplasyFinder.jar using the following command

java -jar HomoplasyFinder.jar --fasta Combined_wg.fasta --tree ML_116_wconst.tre

and got a file

consistencyIndexReport.txt

After that I used --createAnnotatedTree java -jar HomoplasyFinder.jar --fasta Combined_wg.fasta --tree ML_116_wconst.tre --createAnnotatedTree got an additional file

annotatedNewickTree_20-02-20.tree

Now I couldn't understand how to get the homoplasy graph as present in the extdata directory from my data set. Kindly guide me to use this software.

Thanks, Debajyoti

JosephCrispell commented 4 years ago

Hi Debajyoti,

Thank you for using HomoplasyFinder! It isn't a stupid question at all and it looks like you have got HomoplasyFinder working well.

The most important output is the consistencyIndexReport.txt file, which tells you the number of potential homoplasies identified and their positions and alleles present in the FASTA alignment. With this file you can decided whether there are too many homoplasies that will drown out any true phylogenetic signal and/or use their positions in the FASTA to remove them.

Unfortunately the plotting for HomoplasyFinder is only available by using R. However, if you have a large number of homoplasies identified in the consistencyIndexReport (>100 rows) then the visualisation won't work very well anyway.

If you want to create the visualisation in R, you'll first need to install R (from here and RStudio (from here) and then follow these instructions to create the plot. You can learn a bit about how to use R here.

Thanks again for using HomoplasyFinder and let me know how you get on.

Joseph Crispell

BiodeB commented 4 years ago

Dear Sir, Thank you very much for your prompt response, I have a few queries regarding the consistencyIndexReport.txt as I pasted the first few rows of the file. number rows I got 7465 out of 11835 bp sequence length. a) Is there any cutoff value for ConsistencyIndex, if I going to remove all the homoplasy positions then almost 2/3 of the sequence going to be removed. b) How can I define CountsACGT at position 22, 3:0:0:2? c) How can I define MinimumNumberChangesOnTree?

Position ConsistencyIndex CountsACGT MinimumNumberChangesOnTree
22 0.5 3:0:0:2 2
24 0.5 3:0:0:2 2
25 0.222222222 7:3:0:97 9
26 0.25 1:89:0:17 8
28 0.333333333 6:106:0:0 3
29 0.105263158 39:0:71:2 19
31 0.25 5:109:0:0 4
32 0.5 112:0:2:0 2
33 0.285714286 107:0:6:1 7

Thanks, Debajyoti

JosephCrispell commented 4 years ago

Hi,

No problem at all!

To answer your questions: a) Unfortunately the cutoff entirely depends on your data. However, if you have a lot of homoplasies in the data any phylogeny you construct is likely to be incorrect. b) The CountsACGT at position 22: here you have three sequences with the nucleotide A and two sequences nucleotide T at position 22 in your FASTA file. It would be great if you check this, the remainder of your sequences should have an N at this position? c) The MinimumNumberChangesOnTree is defined as below:

In 1971, Fitch [15] described calculating the minimum number of state changes required on a phylogenetic tree to explain the characters observed for that state at the tips of the tree. An example of a state is a nucleotide at a position in an alignment, that state can vary in different sequences and is usually one of four nucleotides (A=Adenine, C=Cytosine, G=Guanine, and T=Thymine). The consistency index uses the minimum number of state changes to determine how consistent the nucleotides observed at a site in an alignment are with a phylogeny [16]. When the consistency index is equal to one that site is entirely consistent, values less than one indicate a degree of inconsistency, with zero meaning completely inconsistent.

Also see Figure 1 of the HomoplasyFinder manuscript here.

I hope the above answers make sense and are useful.

Joseph Crispell