davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
652 stars 185 forks source link

Error inferring species trees #116

Closed Aborneman closed 6 years ago

Aborneman commented 6 years ago

Hi,

I'm currently using orthofinder on 200 very-closely related genomes (strains of the same yeast species). I've repeatedly observed an error which occurs at the tree building step. Any advice in solving this would be appreciated.

Cheers,

Anthony

Inferring gene and species trees
--------------------------------
2017-09-13 01:47:45 : Done 0 of 9768

 . Error: Invalid distance matrix : numerical value expected for taxon '0' instead of '9.29692e-05'.
2017-09-13 01:51:02 : Done 1000 of 9768
2017-09-13 01:52:36 : Done 2000 of 9768
2017-09-13 01:54:08 : Done 3000 of 9768
2017-09-13 01:55:36 : Done 4000 of 9768
2017-09-13 01:57:04 : Done 5000 of 9768
2017-09-13 01:58:20 : Done 6000 of 9768
2017-09-13 01:59:27 : Done 7000 of 9768
2017-09-13 02:00:35 : Done 8000 of 9768
2017-09-13 02:01:45 : Done 9000 of 9768

Best outgroup(s) for species tree
---------------------------------
Traceback (most recent call last):
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/orthofinder.py", line 1554, in <module>
    GetOrthologues(dirs, options, program_caller, clustersFilename_pairs, orthogroupsResultsFilesString)
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/orthofinder.py", line 1384, in GetOrthologues
    options.separatePickleDir)
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/scripts/get_orthologues.py", line 816, in OrthologuesWorkflow
    roots, clusters, rootedSpeciesTreeFN, nSupport = rfd.GetRoot(spTreeFN_ids, os.path.split(db.TreeFilename_IDs(0))[0] + "/", rfd.GeneToSpecies_dash, nHighParallel, treeFmt = 1)
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/scripts/root_from_duplications.py", line 406, in GetRoot
    speciesTree = tree.Tree(speciesTreeFN, format=treeFmt)
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/scripts/tree.py", line 180, in __init__
    read_newick(newick, root_node = self, format=format)
  File "/home/biosciences/bin/OrthoFinder-master/orthofinder/scripts/newick.py", line 231, in read_newick
    raise NewickError('Unexisting tree file or Malformed newick tree structure.')
scripts.newick.NewickError: Unexisting tree file or Malformed newick tree structure.
davidemms commented 6 years ago

Hi Anthony

It looks like it was the problem that was fixed by this commit: https://github.com/davidemms/OrthoFinder/commit/8a0ef5cb8a25a535cbffa7d3ef5f3f749709288e. If you're using a version from before 1.1.2 then that will be the problem aand updating to the latest release will resolve it.

Basically FastME doesn't accept scientific notation so the species tree inference failed. Now OrthoFinder only passes numbers to it in decimal which resolves the issue.

Let me know how you get on.

All the best David

Aborneman commented 6 years ago

Hi David, Thanks for the quick reply, that does sound like the issue, only problem is I'm using OrthoFinder version 1.1.10. I am however using Diamond to do the homology matching. Cheers, Anthony

davidemms commented 6 years ago

Ah I see, the same problem also applied for the species tree but wasn't addressed in the previous fix. I'm just submitting a fix for that now. You'll be able to get the updated version using the 'Clone or download' button on the main page: https://github.com/davidemms/OrthoFinder.

Thanks David

davidemms commented 6 years ago

Fixed by commit b2a3f32a0482f8344c7b4bc63c59add22cedcaee