davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
701 stars 188 forks source link

Error with species tree distances #617

Closed JC-therea closed 2 years ago

JC-therea commented 3 years ago

Dear David,

First of all, I want to thank you for creating this program that is very useful and saves a lot of time in phylogenetic analysis. I am writing you here because I have a problem with OrthoFinder in one specific case. I ran OrthoFinder with a set of 16 different insects with the following command:

python2.7 $OrthoFinder_PATH -M msa -T iqtree -f $ORTHOFINDER_DIR -a 72 -t 72 -n "MSA_IQtree"

The output was fine with the correct species tree and the distances between species adjusted in the newick file. However, I wanted to do an additional analysis increasing the number of genes adding some that are not annotated but are extracted from their transcriptomes. This produced that some proteomes that had ~20,000 genes become in proteomes with near 50,000 proteins. The species tree was like before but with distances of one among the different species. A part of the newick file looks like this:

((Drosophila_sechelia:0.00995709,Drosophila_simulans:0.00338209):1):1):1,(Drosophila_persimilis:0.00962517,Drosophila_pseudoobscura:0.00233249):1):1):1):1):1):1);

Do you know why this happens? I ran the same protocol but in a different set of species and the output was very similar to the original run.

Thank you in advance

davidemms commented 3 years ago

Hi

Was this second analysis run with the command you gave in your question? I can't think of any reason for this to happen when using the MSA method. What was the output that orthofinder printed to the terminal about the species tree inference?

If you used the MSA method then there will be a file in your results directory called "MultipleSequenceAlignments/SpeciesTreeAlignment.fa" which you could use to infer your species tree directly with IQTREE or any other method to get your branch lengths.

Alternatively, there will be another version of the species tree but with OrthoFinder's internal species IDs, this file is called "WorkingDirectory/SpeciesTree_rooted_ids.txt". You could check if this file contains the branch lengths. If this file does have the correct branch lengths, you can convert the file to use your species names using the command:

python OrthoFinder_source/tools/convert_orthofinder_tree_ids.py WorkingDirectory/SpeciesTree_rooted_ids.txt WorkingDirectory/SpeciesIDs.txt

Best wishes David

JC-therea commented 3 years ago

Hi David,

Thank you for your answer. Yes, I ran: python2.7 $OrthoFinder_PATH -M msa -T iqtree -f $ORTHOFINDER_DIR -a 72 -t 72 -n "MSA_IQtree"

As it takes 10 days in my system to finish the job I checked the file that you mention and there was the estimations of the species tree! With the comman that you wrote the Newick tree was as expected to be.

I don't know why this happens but thank you very much for helping me.

Best wishes Carlos

davidemms commented 3 years ago

Hi Carlos

Would you mind attaching the SpeciesTree_rooted_ids.txt file here or emailing it to me at david.emms@plants.ox.ac.uk? I'd like to locate what caused the issue.

Thanks David

JC-therea commented 3 years ago

Of course, is here

Kind regards Carlos