JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

Phylogenetic Tree #20

Closed nsamhada closed 3 years ago

nsamhada commented 3 years ago

Hello, it seems as though the phylogenetic trees I try to make are always faulty. I tried two different methods, and one provided this error: Error in rJava::.jcall(javaHomoplasyFinderClass, method = "runHomoplasyFinderFromR", : java.lang.NumberFormatException: For input string: "1e-08)"

and the other provided this: complete genome" isn't present as a tip label in the newick tree file. The sequence and tip IDs must match exactly.

What is a good way to build my phylogenetic tree using my aligned fasta sequences so that it can work as an input for the tool?

JosephCrispell commented 3 years ago

Hi,

Thanks for using homoplasyFinder and sorry it isn't working. HomoplasyFinder uses a standard newick tree format.

Your first error is caused by your tree file have an extra bracket somewhere. The other error is caused by a tip label not being found in the FASTA file.

Most tree building software will be able to output a phylogeny in a newick format, here a few you could try phylip, RAxML, and ape. You could also load your tree into a software like figtree, icytree, or ape and save it as a newick formatted tree file.

I hope the above is useful.

nsamhada commented 3 years ago

Thank you for you for your quick reply!

Once I do convert the tree into Newick format, I get the error: "ERROR!! The following sequence name: "NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome" isn't present as a tip label in the newick tree file. The sequence and tip IDs must match exactly. Error: no more error handlers available (recursive errors?); invoking 'abort' restart"

The sequence name that is not found is for the reference genome. Do you think that has to do with it?

JosephCrispell commented 3 years ago

Hi,

That error means the sequence ID ("NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome") present in the FASTA file is not present in the tree file as a tip label. To check this you can open the tree file and search for it.

The quickest solution is just to open the FASTA file, remove the sequence and label for "NC_045512.2 Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome" and save the file as a different FASTA file that you can then send to homoplasyFinder.

nsamhada commented 3 years ago

It worked thank you!