davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
686 stars 186 forks source link

OrthoFinder fails to complete #73

Closed WallyL closed 7 years ago

WallyL commented 7 years ago

After posting I realized you're at ver. 1.1.4 so I'm getting it updated on our cluster and I'll see if that solves the problem. If not, I'll update again.

Hi David,

I think is may be a problem that is similar to one that has already been addressed here: https://github.com/davidemms/OrthoFinder/issues/31

I am running ver. 1.0.8 on a cluster and giving it 30 cores.

I have 2 sets of data derived from assembled and annotated bacterial genomes. The first set is 30 strains and it ran fine. The second set is 65 strains. I have run the pipeline twice and, based on the out file, it fails at the same point both times: step 4. Best outgroup(s) for species tree.

3. Inferring gene and species trees

2017-06-01 13:01:09 : Done 2000 of 2888 2017-06-01 13:00:16 : Done 1000 of 2888 2017-06-01 12:59:23 : Done 0 of 2888 A duplicate accession was found using just first part: rast|0.CDS.1 Tried to use only the first part of the accession in order to list the sequences in each orthogroup more concisely but these were not unique. The full accession line will be used instead.

4. Best outgroup(s) for species tree (There is no info below this heading)

All the result files are present except in the Results/Orthologues dir. there is no Orthologues subdir. and no SpeciesTree_rooted.txt file.

The error out ends with the following information:

. Error: Invalid distance matrix : numerical value expected for taxon '63_1373' instead of '7.58869e-05'. Traceback (most recent call last): File "/usr/local/orthofinder/1.0.8/orthofinder/orthofinder.py", line 1193, in orthologuesResultsFilesString = get_orthologues.GetOrthologues(workingDir, resultsDir, speciesToUse, nSpAll, clustersFilename_pairs, nBlast) File "/panfs/pstor.storage/rcclocal/zcluster/orthofinder/1.0.8/orthofinder/scripts/get_orthologues.py", line 570, in GetOrthologues roots, clusters, rootedSpeciesTreeFN, nSupport = rfd.GetRoot(spTreeFN_ids, os.path.split(db.treesPatIDs)[0] + "/", rfd.GeneToSpecies_dash, nProcesses, treeFmt = 1) File "/panfs/pstor.storage/rcclocal/zcluster/orthofinder/1.0.8/orthofinder/scripts/root_from_duplications.py", line 405, in GetRoot list_of_lists = pool.map(SupportedHierachies_wrapper2, [(fn, GeneToSpeciesMap, species, dict_clades, clade_names) for fn in glob.glob(treesDir + "/*")]) File "/usr/local/anaconda/2.3.0/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/local/anaconda/2.3.0/lib/python2.7/multiprocessing/pool.py", line 567, in get raise self._value scripts.newick.NewickError: Unexisting tree file or Malformed newick tree structure.

Any help you might be able to offer would be much appreciated. I'd be glad to run any tests or share the data to help solve.

Best, Walt

davidemms commented 7 years ago

Hi

The issue should be resolved in the latest version of OrthoFinder (1.1.4). It was caused by using scientific notation in the distance matrix passed to fastme and was resolved with this submission: https://github.com/davidemms/OrthoFinder/commit/8a0ef5cb8a25a535cbffa7d3ef5f3f749709288e

I'll close this issue now but please reopen it if you still have a problem once you've updated to the latest version.

All the best David