davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
678 stars 186 forks source link

Species tree inference failed #624

Open shiyi-pan opened 2 years ago

shiyi-pan commented 2 years ago

Hi, I used orthofinder to do Gene family analysis, But I met an error ,could you help me fix it ? thank you so much. here is part of my logs.

OrthoFinder version 2.4.1 Copyright (C) 2014 David Emms

2021-01-09 11:16:17 : Starting OrthoFinder 2.4.1 12 thread(s) for highly parallel tasks (BLAST searches etc.) 12 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "makeblastdb -help" - ok Test can run "blastp -help" - ok Test can run "mcl -h" - ok Test can run "mafft /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory//_dependencies_check/SimpleTest.fa" - ok Test can run "iqtree" - ok

WARNING: Files have been ignored as they don't appear to be FASTA files: run_orthofinder_source.sh run_orthofinder_source.sh.o28994 run_orthofinder_source.sh.o29262 test.py OrthoFinder expects FASTA files to have one of the following extensions: fasta, pep, fas, faa, fa

Dividing up work for BLAST for parallel processing

2021-01-09 11:17:05 : Creating Blast database 1 of 36 2021-01-09 11:17:06 : Creating Blast database 2 of 36 2021-01-09 11:17:07 : Creating Blast database 3 of 36 2021-01-09 11:17:10 : Creating Blast database 4 of 36 2021-01-09 11:17:13 : Creating Blast database 5 of 36 2021-01-09 11:17:17 : Creating Blast database 6 of 36 2021-01-09 11:17:19 : Creating Blast database 7 of 36 2021-01-09 11:17:22 : Creating Blast database 8 of 36 2021-01-09 11:17:25 : Creating Blast database 9 of 36 2021-01-09 11:17:30 : Creating Blast database 10 of 36 2021-01-09 11:17:33 : Creating Blast database 11 of 36 2021-01-09 11:17:35 : Creating Blast database 12 of 36 2021-01-09 11:17:40 : Creating Blast database 13 of 36 2021-01-09 11:17:45 : Creating Blast database 14 of 36 2021-01-09 11:17:50 : Creating Blast database 15 of 36 2021-01-09 11:17:55 : Creating Blast database 16 of 36 2021-01-09 11:17:58 : Creating Blast database 17 of 36 2021-01-09 11:18:01 : Creating Blast database 18 of 36 2021-01-09 11:18:06 : Creating Blast database 19 of 36 2021-01-09 11:18:11 : Creating Blast database 20 of 36 2021-01-09 11:18:16 : Creating Blast database 21 of 36 2021-01-09 11:18:21 : Creating Blast database 22 of 36 2021-01-09 11:18:24 : Creating Blast database 23 of 36 2021-01-09 11:18:27 : Creating Blast database 24 of 36 2021-01-09 11:18:32 : Creating Blast database 25 of 36 2021-01-09 11:18:35 : Creating Blast database 26 of 36 2021-01-09 11:18:38 : Creating Blast database 27 of 36 2021-01-09 11:18:43 : Creating Blast database 28 of 36 2021-01-09 11:18:48 : Creating Blast database 29 of 36 2021-01-09 11:18:54 : Creating Blast database 30 of 36 2021-01-09 11:18:59 : Creating Blast database 31 of 36 2021-01-09 11:19:04 : Creating Blast database 32 of 36 2021-01-09 11:19:09 : Creating Blast database 33 of 36 2021-01-09 11:19:14 : Creating Blast database 34 of 36 2021-01-09 11:19:17 : Creating Blast database 35 of 36 2021-01-09 11:19:22 : Creating Blast database 36 of 36

Running BLAST all-versus-all

Using 12 thread(s) 2021-01-09 11:19:25 : This may take some time.... 2021-01-25 14:22:23 : Done 300 of 1296

WARNING: program called by OrthoFinder produced output to stderr

Command: blastp -outfmt 6 -evalue 0.001 -query /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Species35.fa -db /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/BlastDBSpecies28 -out /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Blast35_28.txt

stdout

b'' stderr

b'Warning: (1431.1) CFastaReader: Ignoring invalid residue . at line 2, position 111\nWarning: (1431.1) CFastaReader: Ignoring invalid residue . at line 4, position 499\n

WARNING: program called by OrthoFinder produced output to stderr

Command: blastp -outfmt 6 -evalue 0.001 -query /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Species35.fa -db /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/BlastDBSpecies3 -out /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Blast35_3.txt ......... ......... ........

2021-03-04 19:25:27 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2021-03-04 19:25:42 : Initial processing of each species 2021-03-04 20:04:11 : Initial processing of species 0 complete 2021-03-04 20:41:03 : Initial processing of species 1 complete 2021-03-04 21:55:16 : Initial processing of species 2 complete 2021-03-04 22:49:56 : Initial processing of species 3 complete 2021-03-04 23:51:48 : Initial processing of species 4 complete 2021-03-05 00:29:04 : Initial processing of species 5 complete 2021-03-05 01:41:34 : Initial processing of species 6 complete 2021-03-05 02:54:12 : Initial processing of species 7 complete 2021-03-05 04:06:58 : Initial processing of species 8 complete 2021-03-05 05:21:07 : Initial processing of species 9 complete 2021-03-05 06:33:48 : Initial processing of species 10 complete 2021-03-05 07:42:26 : Initial processing of species 11 complete 2021-03-05 08:49:46 : Initial processing of species 12 complete 2021-03-05 09:58:26 : Initial processing of species 13 complete 2021-03-05 11:17:30 : Initial processing of species 14 complete 2021-03-05 12:42:02 : Initial processing of species 15 complete 2021-03-05 14:14:39 : Initial processing of species 16 complete 2021-03-05 15:23:14 : Initial processing of species 17 complete 2021-03-05 16:49:39 : Initial processing of species 18 complete 2021-03-05 17:59:31 : Initial processing of species 19 complete 2021-03-05 19:06:48 : Initial processing of species 20 complete 2021-03-05 20:15:43 : Initial processing of species 21 complete 2021-03-05 21:23:02 : Initial processing of species 22 complete 2021-03-05 22:36:21 : Initial processing of species 23 complete 2021-03-05 23:45:53 : Initial processing of species 24 complete 2021-03-06 00:57:09 : Initial processing of species 25 complete 2021-03-06 02:10:45 : Initial processing of species 26 complete 2021-03-06 03:27:59 : Initial processing of species 27 complete 2021-03-06 04:43:05 : Initial processing of species 28 complete 2021-03-06 05:48:40 : Initial processing of species 29 complete 2021-03-06 07:01:40 : Initial processing of species 30 complete 2021-03-06 08:25:25 : Initial processing of species 31 complete 2021-03-06 09:35:57 : Initial processing of species 32 complete 2021-03-06 10:13:33 : Initial processing of species 33 complete 2021-03-06 11:32:13 : Initial processing of species 34 complete 2021-03-06 12:12:19 : Initial processing of species 35 complete 2021-03-06 12:27:43 : Connected putative homologues 2021-03-06 12:30:57 : Written final scores for species 0 to graph file 2021-03-06 12:38:45 : Written final scores for species 12 to graph file 2021-03-06 12:44:24 : Written final scores for species 24 to graph file 2021-03-06 12:31:20 : Written final scores for species 1 to graph file 2021-03-06 12:38:54 : Written final scores for species 13 to graph file 2021-03-06 12:44:39 : Written final scores for species 25 to graph file 2021-03-06 12:31:23 : Written final scores for species 5 to graph file 2021-03-06 12:39:21 : Written final scores for species 14 to graph file 2021-03-06 12:45:38 : Written final scores for species 26 to graph file 2021-03-06 12:35:12 : Written final scores for species 3 to graph file 2021-03-06 12:40:37 : Written final scores for species 15 to graph file 2021-03-06 12:46:18 : Written final scores for species 27 to graph file 2021-03-06 12:35:37 : Written final scores for species 2 to graph file 2021-03-06 12:40:53 : Written final scores for species 21 to graph file 2021-03-06 12:46:35 : Written final scores for species 28 to graph file 2021-03-06 12:35:25 : Written final scores for species 11 to graph file 2021-03-06 12:43:38 : Written final scores for species 19 to graph file 2021-03-06 12:47:06 : Written final scores for species 33 to graph file 2021-03-06 12:35:45 : Written final scores for species 9 to graph file 2021-03-06 12:42:36 : Written final scores for species 22 to graph file 2021-03-06 12:47:19 : Written final scores for species 30 to graph file 2021-03-06 12:35:34 : Written final scores for species 4 to graph file 2021-03-06 12:42:44 : Written final scores for species 20 to graph file 2021-03-06 12:47:39 : Written final scores for species 31 to graph file 2021-03-06 12:35:22 : Written final scores for species 10 to graph file 2021-03-06 12:43:48 : Written final scores for species 18 to graph file 2021-03-06 12:47:40 : Written final scores for species 35 to graph file 2021-03-06 12:35:46 : Written final scores for species 6 to graph file 2021-03-06 12:43:20 : Written final scores for species 23 to graph file 2021-03-06 12:47:53 : Written final scores for species 32 to graph file 2021-03-06 12:35:19 : Written final scores for species 8 to graph file 2021-03-06 12:42:04 : Written final scores for species 17 to graph file 2021-03-06 12:48:48 : Written final scores for species 29 to graph file 2021-03-06 12:35:15 : Written final scores for species 7 to graph file 2021-03-06 12:43:38 : Written final scores for species 16 to graph file 2021-03-06 12:49:59 : Written final scores for species 34 to graph file 2021-03-06 13:19:57 : Ran MCL

Writing orthogroups to file

OrthoFinder assigned 1855627 genes (98.6% of total) to 48444 orthogroups. Fifty percent of all genes were in orthogroups with 66 or more genes (G50 was 66) and were contained in the largest 8957 orthogroups (O50 was 8957). There were 8917 orthogroups with all species present and 256 of these consisted entirely of single-copy genes.

2021-03-06 13:24:11 : Done orthogroups

Analysing Orthogroups

2021-03-06 13:24:19 : Starting MSA/Trees Species tree: Using 1242 orthogroups with minimum of 91.7% of species having single-copy genes in any orthogroup

Inferring multiple sequence alignments for species tree

2021-03-06 16:38:04 : Done 200 of 1242 2021-03-07 06:53:13 : Done 1000 of 1242 2021-03-07 08:53:00 : Done 1200 of 1242 2021-03-06 21:46:25 : Done 400 of 1242 2021-03-07 04:25:22 : Done 800 of 1242 2021-03-07 05:43:31 : Done 900 of 1242 2021-03-07 08:01:06 : Done 1100 of 1242 2021-03-07 03:07:20 : Done 700 of 1242 2021-03-06 19:34:28 : Done 300 of 1242 2021-03-07 01:05:36 : Done 600 of 1242 2021-03-06 14:58:29 : Done 100 of 1242 2021-03-06 23:30:31 : Done 500 of 1242 2021-03-06 13:27:10 : Done 0 of 1242

Inferring remaining multiple sequence alignments and gene trees

2021-05-31 07:43:00 : Done 2000 of 47203 2021-06-18 09:42:04 : Done 16000 of 47203 2021-06-19 14:16:50 : Done 19000 of 47203 2021-06-20 05:47:00 : Done 28000 of 47203 2021-06-20 12:15:58 : Done 43000 of 47203 2021-03-07 09:48:28 : Done 0 of 47203 2021-06-19 00:19:04 : Done 17000 of 47203 2021-06-20 12:11:49 : Done 41000 of 47203 2021-06-20 12:18:01 : Done 44000 of 47203 2021-06-19 09:21:09 : Done 18000 of 47203 2021-06-20 01:52:52 : Done 25000 of 47203 2021-06-20 02:59:30 : Done 26000 of 47203 2021-06-20 09:37:04 : Done 31000 of 47203 2021-06-20 12:08:54 : Done 40000 of 47203 2021-06-06 11:57:51 : Done 4000 of 47203 2021-06-08 16:52:29 : Done 5000 of 47203 2021-06-14 19:45:47 : Done 10000 of 47203 2021-06-20 01:03:48 : Done 24000 of 47203 2021-06-19 21:44:33 : Done 22000 of 47203 2021-06-20 07:15:59 : Done 29000 of 47203 2021-06-20 10:22:10 : Done 32000 of 47203 2021-06-20 11:43:08 : Done 36000 of 47203 2021-06-20 12:20:07 : Done 45000 of 47203 2021-06-19 23:26:05 : Done 23000 of 47203 2021-06-20 11:15:09 : Done 34000 of 47203 2021-06-15 12:29:44 : Done 11000 of 47203 2021-06-17 19:54:34 : Done 15000 of 47203 2021-06-20 11:59:31 : Done 38000 of 47203 2021-06-20 12:04:41 : Done 39000 of 47203 2021-06-10 11:05:42 : Done 6000 of 47203 2021-06-16 06:54:40 : Done 12000 of 47203 2021-06-20 08:40:21 : Done 30000 of 47203 2021-06-20 12:13:56 : Done 42000 of 47203 2021-06-03 17:04:23 : Done 3000 of 47203 2021-06-11 21:15:54 : Done 7000 of 47203 2021-06-13 00:21:56 : Done 8000 of 47203 2021-06-20 10:52:05 : Done 33000 of 47203 2021-06-20 11:52:40 : Done 37000 of 47203 2021-06-20 12:22:11 : Done 46000 of 47203 2021-05-26 23:52:54 : Done 1000 of 47203 2021-06-17 07:24:26 : Done 14000 of 47203 2021-06-20 04:15:44 : Done 27000 of 47203 2021-06-20 11:31:28 : Done 35000 of 47203 2021-06-20 12:24:10 : Done 47000 of 47203 2021-06-14 00:21:29 : Done 9000 of 47203 2021-06-16 19:35:51 : Done 13000 of 47203 2021-06-19 17:35:54 : Done 20000 of 47203 2021-06-19 19:50:55 : Done 21000 of 47203 ERROR: Species tree inference failed ERROR: An error occurred, please review the error messages they may contain useful information about the problem.

davidemms commented 2 years ago

Hi

It looks like you're using ISTREE for tree inference, you can have a look at the log file it produced to see what the problem was, it should be called "WorkingDirectory/Alignments_ids/SpeciesTree.log"

Best wishes David

shiyi-pan commented 2 years ago

David: thank you for your advice. I checked SpeciesTree.log and found there was an error indeed. here is the error:

IQ-TREE multicore version 1.6.12 for Linux 64-bit built Aug 15 2019 Developed by Bui Quang Minh, Nguyen Lam Tung, Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.

Host: compute-0-1.local (AVX, 220 GB RAM) Command: /ds3512/home/panyp/ruanjian/iqtree-1.6.12-Linux/bin/iqtree -s /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan 09/WorkingDirectory/Alignments_ids/SpeciesTreeAlignment.fa -bb 1000 -pre /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_J an09/WorkingDirectory/Alignments_ids/SpeciesTree Seed: 922007 (Using SPRNG - Scalable Parallel Random Number Generator) Time: Sun Mar 7 09:48:27 2021 Kernel: AVX - 1 threads (16 CPU cores detected)

HINT: Use -nt option to specify number of threads because your CPU has 16 cores! HINT: -nt AUTO will automatically determine the best number of threads to use.

Reading alignment file /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09/WorkingDirectory/Alignments_ids/SpeciesTreeAl ignment.fa ... Fasta format detected Alignment most likely contains protein sequences Alignment has 36 sequences with 587974 columns, 323304 distinct patterns 129600 parsimony-informative, 204744 singleton sites, 253630 constant sites Gap/Ambiguity Composition p-value 1 0 30.53% failed 0.00% 2 1 18.80% failed 0.00% 3 2 3.79% passed 99.24% ...... ...... ...... **** TOTAL 7.41% 6 sequences failed composition chi2 test (p-value<5%; df=19) NOTE: minimal branch length is reduced to 0.000000170076 for long alignment

Create initial parsimony tree by phylogenetic likelihood library (PLL)... 240.959 seconds NOTE: ModelFinder requires 19216 MB RAM! ModelFinder will test 546 protein models (sample size: 587974) ... No. Model -LnL df AIC AICc BIC 1 Dayhoff 5415541.378 69 10831220.757 10831220.773 10831999.383 2 Dayhoff+I 5354297.881 70 10708735.762 10708735.779 10709525.673 3 Dayhoff+G4 5328853.866 70 10657847.732 10657847.749 10658637.643 ...... ...... ...... 34 mtMAM+R5 5610761.575 77 11221677.150 11221677.170 11222546.051 35 mtMAM+R6 5610637.309 79 11221432.618 11221432.640 11222324.089 ERROR: Numerical underflow (lh-branch). Run again with the safe likelihood kernel via -safe option

I do some search in this repository and find you don't recommend IQTREE to do the tree inference. Could you tell me which one you recommend ? thank you again.

davidemms commented 2 years ago

Hi

It is an issue that occurs unpredictably in IQTREE. In the past I haven't recommended IQTREE because it can be hard to get it to run successfully at the scale required by OrthoFinder, as you have seen. However, now I think I know how these issues can be resolved.

I think I should be able to help you complete your analysis.

  1. Run iqtree in safe mode on the species tree:

    iqtree -s WorkingDirectory/Alignments_ids/SpeciesTreeAlignment.fa -bb 1000 -pre WorkingDirectory/Alignments_ids/SpeciesTree -safe -nt AUTO
  2. Convert the tree from IDs to species names:

    python OrthoFinder/tools/convert_orthofinder_tree_ids.py WorkingDirectory/Alignments_ids/SpeciesTree.treefile WorkingDirectory/SpeciesIDs.txt
  3. Reroot the tree manually on the correct outgroup and save it as newick format e.g. to file SpeciesTreeRooted.txt

  4. Run OrthoFinder 'from trees' using your SpeciesTreeRooted.txt file (you'll need to provide the path to the file in the command below) :

    python orthofinder.py -ft /ds3512/home/panyp/NN1138-2/03.orthofinder_data/02.SoybeanData/OrthoFinder/Results_Jan09 -s SpeciesTreeRooted.txt

Best wishes David

shiyi-pan commented 2 years ago

Think you for your help, David. I will follow your suggestion. I'm sure how to reroot the tree manually now, so maybe I will trouble you in the future.

shiyi-pan commented 2 years ago

Hi,David. I don't know how to reroot the tree manually on the correct outgroup , I upload all SpeciesTree* file in Results_Jan09/WorkingDirectory/Alignments_ids , could you do me a favor ? Thank you very much. SpeciesTree.zip By the way, my Gene_Trees and Species_Tree is still empty, is it normal ?

davidemms commented 2 years ago

Can you send me an email at david.emms@plants.ox.ac.uk and I will see if I can help.

The directories are empty because OrthoFinder had to terminate because the IQTREE species tree inference failed. I think I have a solution that will allow a complete set of results files to be generated. If you are able to run it successfully then I will use that information to post the solution here for other users.

Best wishes David

shiyi-pan commented 2 years ago

Thank you , David. I have sent you an email , please check if I miss something. By the way, there are four files contain species tree, SpeciesTree.contree, SpeciesTree.iqtree, SpeciesTree.treefile and SpeciesTree_accessions.treefile, Which one should I use to generate the root tree ?

ViriatoII commented 2 years ago

Dear @davidemms,

I have done as you suggested and got this error:

ERROR: 'e_sativa' is missing from species tree ERROR: 'g_gynandra' is missing from species tree ERROR: Additional species ''b_tournefortii.fasta'' in species tree ERROR: Additional species ''b_repanda.fasta'' in species tree

I had to erase the ' symbols and the .fasta suffixes from the RootedTree.txt

Cheers,

xingjianfeng100 commented 2 years ago

Hi,David. when i run the command
orthofinder -S diamond -M msa -T raxml -ft /work/user/....../OrthoFinder/Results_Mar03_2/ -s /work/user/....../OrthoFinder/Results_Mar03_2/Species_Tree/SpeciesTree_rooted2.txt the error was

Test can run "raxml" - failed Warning, you specified a working directory via "-w" Keep in mind that RAxML only accepts absolute path names, not relative ones! RAxML can't, parse the alignment file as phylip file it will now try to parse it as FASTA file RAxML output files with the run ID already exist in directory /tmp/ ...... exiting ERROR: Cannot run user-configured tree method 'raxml' Please check program is installed and that it is correctly configured in the orthofinder/config.json file

does model of "-ft“ lost the $PATH thus cause error?