davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
697 stars 188 forks source link

Species tree without bootstrap support #732

Open manuelsmendoza opened 2 years ago

manuelsmendoza commented 2 years ago

Hi!

I'm using OrthoFinder to infer the evolutionary relationship between different sea slugs (9 species). I run orthofinder v2.5.4 as shown below. The Species tree obtained (Species_Tree/SpeciesTree_rooted.txt) did not contain the bootstrap (branch support) values. Which was my wrong? What should I do to fix that?

((OVI:0.245924,ACA:0.282977):0.0836835,(POC:0.116764,((ECR:0.053792,(ECH:0.06658,EVI:0.055475):0.014468):0.03297,(EOR:0.065667,(ETI:0.009551,ECO:0.009399):0.089493):0.029081):0.057458):0.0836835);
# Run orthofinder
orthofinder \
  -f slugs_cds \
  -a $SLURM_NTASKS \
  -M msa \
  -T raxml-ng \
  -o slugs_ortphy

The log reports the following information:

OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

2022-08-18 14:00:51 : Starting OrthoFinder 2.5.4
24 thread(s) for highly parallel tasks (BLAST searches etc.)
24 thread(s) for OrthoFinder algorithm

Checking required programs are installed
----------------------------------------
Test can run "mcl -h" - ok
Test can run "mafft /mnt/netapp1/Store_uvibgmsm/analysis_klp/phylo/all_cds/phy_cds/Results_Aug18/WorkingDirectory//_dependencies_check/SimpleTest.fa" - ok
Test can run "raxml-ng" - ok

WARNING: Files have been ignored as they don't appear to be FASTA files:
phy_ort.sh
slurm-8764186.out
slurm-8764188.out
OrthoFinder expects FASTA files to have one of the following extensions: fasta, faa, pep, fa, fas

Dividing up work for BLAST for parallel processing
--------------------------------------------------
2022-08-18 14:00:55 : Creating diamond database 1 of 9
2022-08-18 14:00:55 : Creating diamond database 2 of 9
2022-08-18 14:00:55 : Creating diamond database 3 of 9
2022-08-18 14:00:56 : Creating diamond database 4 of 9
2022-08-18 14:00:56 : Creating diamond database 5 of 9
2022-08-18 14:00:56 : Creating diamond database 6 of 9
2022-08-18 14:00:56 : Creating diamond database 7 of 9
2022-08-18 14:00:56 : Creating diamond database 8 of 9
2022-08-18 14:00:56 : Creating diamond database 9 of 9

Running diamond all-versus-all
------------------------------
Using 24 thread(s)
2022-08-18 14:00:56 : This may take some time....
2022-08-18 14:00:56 : Done 0 of 81
2022-08-18 14:02:18 : Done 10 of 81
2022-08-18 14:02:37 : Done 20 of 81
2022-08-18 14:03:11 : Done 30 of 81
2022-08-18 14:03:27 : Done 40 of 81
2022-08-18 14:03:52 : Done 50 of 81
2022-08-18 14:04:56 : Done all-versus-all sequence search

Running OrthoFinder algorithm
-----------------------------
2022-08-18 14:04:57 : Initial processing of each species
2022-08-18 14:05:04 : Initial processing of species 7 complete
2022-08-18 14:05:07 : Initial processing of species 0 complete
2022-08-18 14:05:08 : Initial processing of species 6 complete
2022-08-18 14:05:12 : Initial processing of species 3 complete
2022-08-18 14:05:12 : Initial processing of species 8 complete
2022-08-18 14:05:14 : Initial processing of species 5 complete
2022-08-18 14:05:16 : Initial processing of species 2 complete
2022-08-18 14:05:16 : Initial processing of species 4 complete
2022-08-18 14:05:17 : Initial processing of species 1 complete
2022-08-18 14:05:20 : Connected putative homologues
2022-08-18 14:05:22 : Written final scores for species 7 to graph file
2022-08-18 14:05:23 : Written final scores for species 0 to graph file
2022-08-18 14:05:24 : Written final scores for species 6 to graph file
2022-08-18 14:05:25 : Written final scores for species 8 to graph file
2022-08-18 14:05:25 : Written final scores for species 3 to graph file
2022-08-18 14:05:25 : Written final scores for species 5 to graph file
2022-08-18 14:05:25 : Written final scores for species 4 to graph file
2022-08-18 14:05:26 : Written final scores for species 2 to graph file
2022-08-18 14:05:26 : Written final scores for species 1 to graph file
2022-08-18 14:05:47 : Ran MCL

Writing orthogroups to file
---------------------------
OrthoFinder assigned 135589 genes (96.3% of total) to 17049 orthogroups. Fifty percent of all genes were in orthogroups with 9 or more genes (G50 was 9) and were contained in the largest 4891 orthogroups (O50 was 4891). There were 1950 orthogroups with all species present and 412 of these consisted entirely of single-copy genes.

2022-08-18 14:06:20 : Done orthogroups

Analysing Orthogroups
=====================
2022-08-18 14:06:21 : Starting MSA/Trees
Species tree: Using 1373 orthogroups with minimum of 88.9% of species having single-copy genes in any orthogroup

Inferring multiple sequence alignments for species tree
-------------------------------------------------------
2022-08-18 14:06:51 : Done 0 of 1373
2022-08-18 14:09:50 : Done 100 of 1373
2022-08-18 14:12:00 : Done 200 of 1373
2022-08-18 14:13:54 : Done 300 of 1373
2022-08-18 14:15:03 : Done 400 of 1373
2022-08-18 14:16:19 : Done 500 of 1373
2022-08-18 14:17:37 : Done 600 of 1373
2022-08-18 14:18:49 : Done 700 of 1373
2022-08-18 14:19:53 : Done 800 of 1373
2022-08-18 14:20:55 : Done 900 of 1373
2022-08-18 14:22:02 : Done 1000 of 1373
2022-08-18 14:23:08 : Done 1100 of 1373
2022-08-18 14:24:08 : Done 1200 of 1373
2022-08-18 14:25:04 : Done 1300 of 1373

Inferring remaining multiple sequence alignments and gene trees
---------------------------------------------------------------
2022-08-18 14:25:57 : Done 0 of 15677
2022-08-18 22:40:58 : Done 1000 of 15677
2022-08-19 00:39:23 : Done 2000 of 15677
2022-08-19 01:47:48 : Done 3000 of 15677
2022-08-19 02:32:47 : Done 4000 of 15677
2022-08-19 03:03:02 : Done 5000 of 15677
2022-08-19 03:22:42 : Done 6000 of 15677
2022-08-19 03:36:07 : Done 7000 of 15677
2022-08-19 03:46:17 : Done 8000 of 15677
2022-08-19 03:55:53 : Done 9000 of 15677
2022-08-19 04:06:04 : Done 10000 of 15677
2022-08-19 04:15:40 : Done 11000 of 15677
2022-08-19 04:24:51 : Done 12000 of 15677
2022-08-19 04:25:22 : Done 13000 of 15677
2022-08-19 04:25:41 : Done 14000 of 15677
2022-08-19 04:25:58 : Done 15000 of 15677
2022-08-19 07:02:29 : Done MSA/Trees

Best outgroup(s) for species tree
---------------------------------
2022-08-19 07:02:29 : Starting STRIDE
2022-08-19 07:02:37 : Done STRIDE
Observed 188 well-supported, non-terminal duplications. 168 support the best root and 20 contradict it.
Best outgroup for species tree:
  ACA-GOOD-PROT-TECTIPLEURA-RMDUP, OVI-GOOD-PROT-TECTIPLEURA-RMDUP

Reconciling gene trees and species tree
---------------------------------------
Outgroup: ACA-GOOD-PROT-TECTIPLEURA-RMDUP, OVI-GOOD-PROT-TECTIPLEURA-RMDUP
2022-08-19 07:02:37 : Starting Recon and orthologues
2022-08-19 07:02:37 : Starting OF Orthologues
2022-08-19 07:02:37 : Done 0 of 13406
2022-08-19 07:02:38 : Done 1000 of 13406
2022-08-19 07:02:39 : Done 2000 of 13406
2022-08-19 07:02:41 : Done 3000 of 13406
2022-08-19 07:02:42 : Done 4000 of 13406
2022-08-19 07:02:43 : Done 5000 of 13406
2022-08-19 07:02:44 : Done 6000 of 13406
2022-08-19 07:02:46 : Done 7000 of 13406
2022-08-19 07:02:47 : Done 8000 of 13406
2022-08-19 07:02:48 : Done 9000 of 13406
2022-08-19 07:02:50 : Done 10000 of 13406
2022-08-19 07:02:51 : Done 11000 of 13406
2022-08-19 07:02:52 : Done 12000 of 13406
2022-08-19 07:02:53 : Done 13000 of 13406
2022-08-19 07:02:55 : Done OF Orthologues

Writing results files
=====================
2022-08-19 07:02:56 : Done orthologues

Results:
    /mnt/netapp1/Store_uvibgmsm/analysis_klp/phylo/all_cds/phy_cds/Results_Aug18/

CITATION:
 When publishing work that uses OrthoFinder please cite:
 Emms D.M. & Kelly S. (2019), Genome Biology 20:238

 If you use the species tree in your work then please also cite:
 Emms D.M. & Kelly S. (2017), MBE 34(12): 3267-3278
 Emms D.M. & Kelly S. (2018), bioRxiv https://doi.org/10.1101/267914
Mirror1211 commented 2 years ago

Hi, I am in the same situation.

manuelsmendoza commented 2 years ago

My solution was to rerun that by myself. Using some files from OrthoFinder:

# Estimate the substitution model
modeltest-ng \
  --force \
  --processes $SLURM_NTASKS \ 
  --datatype aa \
  --input SpeciesTreeAlignment.fa \
  --output SpeciesTreeAlignment.model \
  --topology ml \
  --frequencies e \
  --model-het f \
  --template raxml

# Reconstruct the tree
raxml-ng \
  --redo \
  --threads $SLURM_NTASKS \
  --all \
  --check \
  --msa SpeciesTreeAlignment.fa \
  --model MODEL \
  --blopt nr_safe \
  --bs-trees 5000 
Mirror1211 commented 2 years ago

Thanks a lot. I also carried out the phylogenetic analyses using the same dataset (SpeciesTreeAlignment.fa) and similiar methods as you describled. But, I am confused whether the Speciestree in the folder Species_Tree was generated using the SpeciesTreeAlignment.fa file.