davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
673 stars 186 forks source link

Starting STRIDE ERROR #521

Open shzadiqbal opened 3 years ago

shzadiqbal commented 3 years ago

... 2021-03-13 16:16:44 : Written final scores for species 117 to graph file 2021-03-13 16:16:45 : Written final scores for species 110 to graph file 2021-03-13 16:16:45 : Written final scores for species 118 to graph file 2021-03-13 16:16:46 : Written final scores for species 119 to graph file 2021-03-13 16:16:46 : Written final scores for species 111 to graph file

WARNING: program called by OrthoFinder produced output to stderr

Command: mcl /home/mubashir/shzad/azam/pepfiles/OrthoFinder/Results_Mar13/WorkingDirectory/OrthoFinder_graph.txt -I 1.5 -o /home/mubashir/shzad/azam/pepfiles/OrthoFinder/Results_Mar13/WorkingDirectory/clusters_OrthoFinder_I1.5.txt -te 4 -V all

stdout

b'' stderr

b'[mcl] cut <4> instances of overlap\n[mcl] added <6> garbage entries\n' 2021-03-13 16:16:54 : Ran MCL

Writing orthogroups to file

OrthoFinder assigned 40499 genes (91.6% of total) to 9111 orthogroups. Fifty percent of all genes were in orthogroups with 6 or more genes (G50 was 6) and were contained in the largest 1810 orthogroups (O50 was 1810). There were 0 orthogroups with all species present and 0 of these consisted entirely of single-copy genes.

2021-03-13 16:17:01 : Done orthogroups

Analysing Orthogroups

Calculating gene distances

2021-03-13 16:17:33 : Done Using fallback species tree inference method /home/mubashir/miniconda2/envs/denovo/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3419: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /home/mubashir/miniconda2/envs/denovo/lib/python3.8/site-packages/numpy/core/_methods.py:188: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

Inferring gene and species trees

2021-03-13 16:17:42 : Done 0 of 2972 2021-03-13 16:17:42 : Done 1000 of 2972 2021-03-13 16:17:43 : Done 2000 of 2972

Best outgroup(s) for species tree

2021-03-13 16:17:44 : Starting STRIDE Traceback (most recent call last): File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/stride.py", line 506, in GetRoot speciesTree = tree.Tree(speciesTreeFN, format=2) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/tree.py", line 221, in init read_newick(newick, root_node = self, format=format) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/newick.py", line 216, in read_newick raise NewickError('Unexisting tree file or Malformed newick tree structure.') scripts_of.newick.NewickError: Unexisting tree file or Malformed newick tree structure.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "OrthoFinder_source/orthofinder.py", line 7, in main(args) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/main.py", line 1765, in main GetOrthologues(speciesInfoObj, options, prog_caller) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/main.py", line 1527, in GetOrthologues orthologues.OrthologuesWorkflow(speciesInfoObj.speciesToUse, File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/orthologues.py", line 1039, in OrthologuesWorkflow roots, clusterscounter, rootedSpeciesTreeFN, nSupport, , _, stride_dups = stride.GetRoot(spTreeFN_ids, files.FileHandler.GetOGsTreeDir(), stride.GeneToSpecies_dash, nHighParallel, qWriteRootedTree=True) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/stride.py", line 509, in GetRoot speciesTree = tree.Tree(speciesTreeFN, format=1) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/tree.py", line 221, in init read_newick(newick, root_node = self, format=format) File "/home/mubashir/shzad/azam/OrthoFinder_source/scripts_of/newick.py", line 216, in read_newick raise NewickError('Unexisting tree file or Malformed newick tree structure.') scripts_of.newick.NewickError: Unexisting tree file or Malformed newick tree structure.

davidemms commented 3 years ago

Hi

It looks like you probably didn't include all the genes from your species which caused some data OrthoFinder relies on to be missing.

Best wishes David

adamfreedman commented 3 years ago

Hi David, I am running into a similar issue that I haven't encountered before with previous orthofinder analyses ... and it isn't really clear to me what you mean by "you probably didn't include all the genes from your species". Orthofinder takes as input an arbitrary set of protein fastas from different species/genome annotations without any prior info as to their completeness and what genes they include (or don't include), correct? And, by definition, if some proteins originate from lineage-specific de novo genes, then by definition there will be a set of input files that don't contain proteins translated from those genes ... which means Orthofinder would throw exceptions every time such genes were encountered?

garmonan commented 3 years ago

Hi!

I got this exact error and I don't know what to do about it. I have version 2.5.2 installed, and I never got this error before when using OrthoFinder in the same way as I did today.

Any help will be extremely valuable :)

Best,

Andrea

davidemms commented 3 years ago

Hi @adamfreedman, no that's not correct, OrthoFinder expects the proteomes to be complete. This is what it states in the documentation, in practice what it actually means is that OrthoFinder needs to be able to identify sufficiently many hits between each species in order to be able to successfully carry out its analysis. For the clustering stage that means enough hits to be able to model the amount of sequence divergence between each species pair for different gene lengths seen. For species tree inference it means being able to find enough genes across all species in order to be able to infer a species tree, which is required for subsequent analysis.

davidemms commented 3 years ago

Hi @garmonan, would you be able to post the complete output that orthofinder produced? Could you also describe the input you provided. As some of the questions above are about the number of genes provided, probably the most important info is the number of species and the approximate minimum & average number of genes per species.