davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
697 stars 188 forks source link

How to compute support for nodes in the species tree ? #460

Open BiodivGenomic opened 4 years ago

BiodivGenomic commented 4 years ago

Hello, I'm testing OrthoFinder with a bunch of assembled genomes/transcriptomes peptides sequences, and the species tree ended to lack support values. I used the "normal" command-line to run OrthoFinder ("orthofinder -f my_data_folder"), but is there an option to ensure supports will be computed ? As I already ran the entire pipeline, I would also like to know if there is a way to get the support values from what was already computed (took a long time to get orthologs and gene trees, so If I can save this time...) Thanks in advance !

davidemms commented 4 years ago

Hi

What version of OrthoFinder were you using?

All the best David

BiodivGenomic commented 4 years ago

Hi, I'm using the version 2.4.0. Thanks ! Damien

davidemms commented 4 years ago

Hi Damien

By default the species tree will be generated using STAG and will have support values. There is a fallback method which is employed if the data is too limited to use this method. In this case a message is printed, "Using fallback species tree inference method". This occurs if there are fewer than 100 orthogroups with all species present. I wonder if that might be what occurred here?

All the best David

BiodivGenomic commented 4 years ago

Hi David, indeed, it could be an explanation, as I have less only 34 of these groups.... Can you please describe a little bit this fallback species tree inference method ? Is there a way to get support with it ? Thanks in advance !

davidemms commented 4 years ago

Details on the method are given here: https://github.com/davidemms/OrthoFinder#species-tree-inference

Typically this situation arises when OrthoFinder has been provided with incomplete data e.g. only a subset of the genes in a species rather than all genes. If you have the extra information then you should provide it and OrthoFinder will usually be able to find enough data to calculate support values. That said, I see you're using transcriptomes too so I realise that might be the cause of the incomplete data instead? How many input species do you have and how many genes are there in each?

If you are limited in this respect then you could try would be the "-M msa" option, which uses tree inference via multiple sequence alignments. By default this will use FastTree and will give you Shimodaira-Hasegawa support values. Additionally, if you wanted, you would also be able to take the species tree alignment produced here and run any other tree inference program on it (e.g. IQTREE with the option "-bb 1000") to get bootstrap support values.

All the best David

BiodivGenomic commented 4 years ago

Hello, thanks, I will try that. However, I would also like to use STAG for the tree reconstruction, and so get above the threshold for the number of orthogroups with all species included... Is there a way to identify the species that decrease the most this value (and therefore the ones I would preferably remove to increase the number of orthogroups with all species included) ? The file with the count of orthogroups per species could be a start, but I don't think the number of orthogroups in a species and the number of orthogroups with all other species excluding this particular species is directly linked... maybe by crossing it with an other file ? Thanks in advance for your help !

davidemms commented 4 years ago

Yes, you're right about using the orthogroups per species file. I think you can use a few excel formulae to get the answer:

  1. A grid of cells with the test CELL > 0 for each species-othogroup cell in the original table
  2. Sum these rows up and find the orthogroups = (n-1), these are the ones in all but one species
  3. Another grid of cells testing where the sum is (n-1) and a particular species = 0. These are the ones where removing that species would make the difference
  4. You can then sum these final columns up to see which species' removal would have the most affect.

All the best David

xieyichun50 commented 3 years ago

Details on the method are given here: https://github.com/davidemms/OrthoFinder#species-tree-inference

Typically this situation arises when OrthoFinder has been provided with incomplete data e.g. only a subset of the genes in a species rather than all genes. If you have the extra information then you should provide it and OrthoFinder will usually be able to find enough data to calculate support values. That said, I see you're using transcriptomes too so I realise that might be the cause of the incomplete data instead? How many input species do you have and how many genes are there in each?

If you are limited in this respect then you could try would be the "-M msa" option, which uses tree inference via multiple sequence alignments. By default this will use FastTree and will give you Shimodaira-Hasegawa support values. Additionally, if you wanted, you would also be able to take the species tree alignment produced here and run any other tree inference program on it (e.g. IQTREE with the option "-bb 1000") to get bootstrap support values.

All the best David

Hi David, I would also like to use the alignment result generated by Orthofinder to build a species tree with bootstrap value. Can it be achieved using Orthofinder or I need to do it additionally with IQTREE? I am a little bit confused by what file should be regarded as the alignment result file for IQTREE input. By the way, I am using orthofinder version 2.5.2. Thanks in advance!

davidemms commented 3 years ago

Hi

I would recommend using IQTREE directly. There is a concatenated multiple sequence alignment file called MultipleSequenceAlignments/SpeciesTreeAlignment.fa that you can use if you selected the -M msa option with OrthoFinder.

Best wishes David

xieyichun50 commented 3 years ago

Hi

I would recommend using IQTREE directly. There is a concatenated multiple sequence alignment file called MultipleSequenceAlignments/SpeciesTreeAlignment.fa that you can use if you selected the -M msa option with OrthoFinder.

Best wishes David

Hi David, Thank you for your suggestion. I have tried and it works for me.

Best, Yichun