davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
673 stars 186 forks source link

Locating Single-Copy Genes Used to Infer Species Tree #606

Open CRL-CHAR opened 3 years ago

CRL-CHAR commented 3 years ago

Hi David,

I am using Orthofinder 2.5.4 on a dataset of 15 organisms. My output states, "There were 2989 orthogroups with all species present and 30 of these consisted entirely of single-copy genes". I am afraid when I add more organisms, I will not have any single-copy genes where all species are present. Is there another file that uses single-copy genes to infer the species tree, but species with multi-copy genes are replaced with gaps?

This past post seems to get at what I am describing, but I cannot locate the file "Species_Tree/Orthogroups_for_concatenated_alignment.txt" that is mentioned --> https://github.com/davidemms/OrthoFinder/issues/385

matrs commented 3 years ago

It should be inside the orthofinder results directory if option -M msa was used : WorkingDirectory/Species_Tree/Orthogroups_for_concatenated_alignment.txt

CRL-CHAR commented 3 years ago

Thank you, matrs! I did not use the "-M msa" option. I will go back and re-run my job using it.

davidemms commented 3 years ago

Just to follow up on this, @matrs is correct about this for the "-M msa" option. If you don't use that option, then OrthoFinder uses the STAG algorithm for species tree inference (https://doi.org/10.1101/267914) which is able to use any orthogroup with all species present. That should deal with the case you were asking about here, because although you only have 30 single-copy orthogroups, you have 2989 orthogroups with all species present -- plenty for species tree analysis, even if you add many more species.

Prabhu89-code commented 1 year ago

Dear Dr. David, Here to construct the species tress, which data is correct ? 1. Tree constructed using single-copy orthologous group or 2. STAG uses all orthologous groups among the species ? I am getting two different trees between 5 species. Using STAG seems more meaningful and correlate with taxonomic nomenclature but with single-copy orthologous outgroup showing within clade. I haven't read any papers used STAG. Most publication rely on concatenated single-copy orthologs.

Could anyone suggest the right data to construct species tree.

Thank you!