davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
673 stars 186 forks source link

Failed to execute script orthofinder #477

Closed GroovyLooper closed 3 years ago

GroovyLooper commented 3 years ago

Hello, I've been having some trouble with Orthofinder 2.4.0 lately. When I attempt to run orthofinder, it seems that I am getting multiple errors in the "Analyzing Orthogroups" section. I've looked through many of the other help threads but I cannot find an answer to this problem. Any help would be greatly appreciated.

Here is my input and my output.

orthofinder -f species/

OrthoFinder version 2.4.0 Copyright (C) 2014 David Emms

2020-11-08 12:54:29 : Starting OrthoFinder 40 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /home/immunoviromics/antiviral_shared/Max/BLASTloopRIG/RIGOutput_e-20_c70/species/OrthoFinder/Results_Nov08/WorkingDirectory/SimpleTest.phy -o /home/immunoviromics/antiviral_shared/Max/BLASTloopRIG/RIGOutput_e-20_c70/species/OrthoFinder/Results_Nov08/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing

2020-11-08 12:54:30 : Creating diamond database 1 of 8 2020-11-08 12:54:30 : Creating diamond database 2 of 8 2020-11-08 12:54:30 : Creating diamond database 3 of 8 2020-11-08 12:54:30 : Creating diamond database 4 of 8 2020-11-08 12:54:30 : Creating diamond database 5 of 8 2020-11-08 12:54:30 : Creating diamond database 6 of 8 2020-11-08 12:54:30 : Creating diamond database 7 of 8 2020-11-08 12:54:30 : Creating diamond database 8 of 8

Running diamond all-versus-all

Using 40 thread(s) 2020-11-08 12:54:30 : This may take some time.... 2020-11-08 12:54:30 : Done 0 of 64 2020-11-08 12:54:30 : Done 10 of 64 2020-11-08 12:54:30 : Done 20 of 64 2020-11-08 12:54:31 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2020-11-08 12:54:31 : Initial processing of each species 2020-11-08 12:54:31 : Initial processing of species 0 complete 2020-11-08 12:54:31 : Initial processing of species 1 complete 2020-11-08 12:54:32 : Initial processing of species 2 complete 2020-11-08 12:54:32 : Initial processing of species 3 complete 2020-11-08 12:54:32 : Initial processing of species 4 complete WARNING: Too few hits between species 5 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 5 and species 6 to normalise the scores, these hits will be ignored 2020-11-08 12:54:32 : Initial processing of species 5 complete WARNING: Too few hits between species 6 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 6 and species 6 to normalise the scores, these hits will be ignored 2020-11-08 12:54:32 : Initial processing of species 6 complete 2020-11-08 12:54:32 : Initial processing of species 7 complete 2020-11-08 12:54:34 : Connected putative homologues 2020-11-08 12:54:34 : Written final scores for species 0 to graph file 2020-11-08 12:54:34 : Written final scores for species 1 to graph file 2020-11-08 12:54:34 : Written final scores for species 2 to graph file 2020-11-08 12:54:34 : Written final scores for species 3 to graph file 2020-11-08 12:54:34 : Written final scores for species 4 to graph file 2020-11-08 12:54:34 : Written final scores for species 5 to graph file 2020-11-08 12:54:34 : Written final scores for species 6 to graph file 2020-11-08 12:54:34 : Written final scores for species 7 to graph file 2020-11-08 12:54:34 : Ran MCL

Writing orthogroups to file

OrthoFinder assigned 46 genes (95.8% of total) to 6 orthogroups. Fifty percent of all genes were in orthogroups with 11 or more genes (G50 was 11) and were contained in the largest 2 orthogroups (O50 was 2). There were 0 orthogroups with all species present and 0 of these consisted entirely of single-copy genes.

2020-11-08 12:54:34 : Done orthogroups

Analysing Orthogroups

Calculating gene distances

Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored 2020-11-08 12:54:36 : Done Using fallback species tree inference method /tmp/_MEI0m2Vfc/numpy/core/fromnumeric.py:2920: RuntimeWarning: Mean of empty slice. /tmp/_MEI0m2Vfc/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars

Inferring gene and species trees

Best outgroup(s) for species tree

2020-11-08 12:54:38 : Starting STRIDE Traceback (most recent call last): File "orthofinder.py", line 7, in File "scripts_of/main.py", line 1733, in main File "scripts_of/main.py", line 1513, in GetOrthologues File "scripts_of/orthologues.py", line 1004, in OrthologuesWorkflow File "scripts_of/stride.py", line 509, in GetRoot File "scripts_of/tree.py", line 221, in init File "scripts_of/newick.py", line 216, in read_newick scripts_of.newick.NewickError: Unexisting tree file or Malformed newick tree structure. [1688272] Failed to execute script orthofinder

Much thanks, Max

davidemms commented 3 years ago

Hi Max

The problem is linked to the initial warning messages in your output. It's looks like there were not enough homologous sequences in your input files in order for OrthoFidner to run the analysis. Did you include all the species protein sequences in your input files?

All the best David

GroovyLooper commented 3 years ago

Hi David, First, thank you for the quick reply. I did not include the complete proteome for any of the species as there is a certain protein that I am looking at. I've BLASTed that protein and have taken the results from the BLAST, separated them by species and am now attempting to run them through Orthofinder. There are 46 sequences between 8 species (with only one species having only one gene of interest), which I thought should be enough for Orthofinder to recognize. Additionally, Orthofinder seems to work when I remove some of the species and genes of interest (7 species and 25 genes), although I still get the error telling me that there are too few hits between species x and species y.

With 7 species and 25 genes (most of these sequences being the same as those run in the first post above), this is my output:

`OrthoFinder version 2.4.0 Copyright (C) 2014 David Emms

2020-11-09 12:32:29 : Starting OrthoFinder 40 thread(s) for highly parallel tasks (BLAST searches etc.) 1 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /home/immunoviromics/antiviral_shared/Max/BLASTloopMDA5/MDA5Output_e-20_c75/species/OrthoFinder/Results_Nov09_5/WorkingDirectory/SimpleTest.phy -o /home/immunoviromics/antiviral_shared/Max/BLASTloopMDA5/MDA5Output_e-20_c75/species/OrthoFinder/Results_Nov09_5/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing

2020-11-09 12:32:30 : Creating diamond database 1 of 7 2020-11-09 12:32:30 : Creating diamond database 2 of 7 2020-11-09 12:32:30 : Creating diamond database 3 of 7 2020-11-09 12:32:30 : Creating diamond database 4 of 7 2020-11-09 12:32:30 : Creating diamond database 5 of 7 2020-11-09 12:32:30 : Creating diamond database 6 of 7 2020-11-09 12:32:30 : Creating diamond database 7 of 7

Running diamond all-versus-all

Using 40 thread(s) 2020-11-09 12:32:30 : This may take some time.... 2020-11-09 12:32:30 : Done 0 of 49 2020-11-09 12:32:30 : Done 10 of 49 2020-11-09 12:32:31 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2020-11-09 12:32:31 : Initial processing of each species 2020-11-09 12:32:31 : Initial processing of species 0 complete 2020-11-09 12:32:31 : Initial processing of species 1 complete WARNING: Too few hits between species 2 and species 2 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 2 and species 4 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 2 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 2 and species 6 to normalise the scores, these hits will be ignored 2020-11-09 12:32:31 : Initial processing of species 2 complete 2020-11-09 12:32:31 : Initial processing of species 3 complete WARNING: Too few hits between species 4 and species 2 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 4 and species 4 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 4 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 4 and species 6 to normalise the scores, these hits will be ignored 2020-11-09 12:32:31 : Initial processing of species 4 complete WARNING: Too few hits between species 5 and species 2 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 5 and species 4 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 5 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 5 and species 6 to normalise the scores, these hits will be ignored 2020-11-09 12:32:31 : Initial processing of species 5 complete WARNING: Too few hits between species 6 and species 2 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 6 and species 4 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 6 and species 5 to normalise the scores, these hits will be ignored WARNING: Too few hits between species 6 and species 6 to normalise the scores, these hits will be ignored 2020-11-09 12:32:31 : Initial processing of species 6 complete 2020-11-09 12:32:34 : Connected putative homologues 2020-11-09 12:32:34 : Written final scores for species 0 to graph file 2020-11-09 12:32:34 : Written final scores for species 1 to graph file 2020-11-09 12:32:34 : Written final scores for species 2 to graph file 2020-11-09 12:32:34 : Written final scores for species 3 to graph file 2020-11-09 12:32:34 : Written final scores for species 4 to graph file 2020-11-09 12:32:34 : Written final scores for species 5 to graph file 2020-11-09 12:32:34 : Written final scores for species 6 to graph file 2020-11-09 12:32:34 : Ran MCL

Writing orthogroups to file

OrthoFinder assigned 25 genes (100.0% of total) to 4 orthogroups. Fifty percent of all genes were in orthogroups with 5 or more genes (G50 was 5) and were contained in the largest 2 orthogroups (O50 was 2). There were 1 orthogroups with all species present and 0 of these consisted entirely of single-copy genes.

2020-11-09 12:32:34 : Done orthogroups

Analysing Orthogroups

Calculating gene distances

Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored 2020-11-09 12:32:36 : Done Using fallback species tree inference method

Inferring gene and species trees

Best outgroup(s) for species tree

2020-11-09 12:32:37 : Starting STRIDE 2020-11-09 12:32:37 : Done STRIDE Observed 0 well-supported, non-terminal duplications. 0 support the best roots and 0 contradict them. Best outgroups for species tree: Mnemiopsis_leidyi Amphimedon_queenslandica, Mnemiopsis_leidyi, Hofstenia_miamia Amphimedon_queenslandica Amphimedon_queenslandica, Hofstenia_miamia Exaiptasia_Pallida Trichoplax_adhaerens Aurelia_aurita, Exaiptasia_Pallida, Calvadosia_cruxmelitensis Hofstenia_miamia Calvadosia_cruxmelitensis Aurelia_aurita Aurelia_aurita, Calvadosia_cruxmelitensis

WARNING: Multiple potential species tree roots were identified, only one will be analyed.

Reconciling gene trees and species tree

Outgroup: Mnemiopsis_leidyi 2020-11-09 12:32:37 : Starting Recon and orthologues 2020-11-09 12:32:37 : Starting OF Orthologues 2020-11-09 12:32:37 : Done 0 of 4 2020-11-09 12:32:37 : Done OF Orthologues 2020-11-09 12:32:37 : Done Recon

Writing results files

2020-11-09 12:32:37 : Done orthologues

Results: /home/immunoviromics/antiviral_shared/Max/BLASTloopMDA5/MDA5Output_e-20_c75/species/OrthoFinder/Results_Nov09_5/

CITATION: When publishing work that uses OrthoFinder please cite: Emms D.M. & Kelly S. (2019), Genome Biology 20:238

If you use the species tree in your work then please also cite: Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914 Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored `

As you can see, there are fewer genes being analyzed here and yet it seems to work, which confuses me as to why Orthofinder wouldn't work with nearly double the number of input sequences. Does this then mean that this output is somehow inaccurate?

Thank you, Max

davidemms commented 3 years ago

Hi Max

Yes, OrthoFinder will have problems inferring an accurate species tree and identifying the correct root with so little data. It will also have problems at the orthogroup inference stage correcting for the divergence between species. E.g the question "are two genes orthologs from distantly related species of more anciently diverging paralogs from closely related species" is difficult to answer if there's not a pool of other genes to compare with. I'd definitely recommend providing all genes, it should run in less time than it takes to get an answer on github ;)

All the best David

All the best David