Open raufs opened 1 year ago
Hi Rauf
Thanks, and I appreciate the feedback. I think this is probably due to the tree inference and/or the MSA inference and therefore best resolved by providing a fixed seed for the random number generation, where possible. There are two overall options for tree inference with OrthoFinder:
default: tree inference using a distance matrix and fastme. The fastme command line says
-z seed, --seed=seed
Use this option to initialize randomization with seed value.
Only helpful when bootstrapping.
so I'm not 100% sure if it's deterministic or not, but unfortunately this parameter won't have an affect forOrthoFinder as it doesn't rely on bootstrapping
-M msa: By default uses mafft and FastTree. I can't find references for providing a seed for the random number generator for these (FastTree has one for the support values, but again these don't affect orthofinder).
Which options did you see the non-determinacy with?
I think if you wanted deterministic behaviour you'd need a tree inference program and MSA inference program that allowed you to specify the seed. I know RAxML and IQTREE both do, if you found and MSA program that was also deterministic then you could use that. You can edit the options of any programs used in the OrthoFinder config.json file: https://github.com/davidemms/OrthoFinder#configjson--adding-addtional-programs-for-tree-inference-local-alignment-or-msa
All the best David
Hi David,
Thank you for your reply!
I observed the behavior with just default settings, so DendroBLAST distance matrices + FastME.
Using a small set of bacterial proteomes, it seems two corresponding gene trees in the Gene_Trees/
folder are different as expected from the Phylogenetic_Hierarchical_Orthogroups/N0.tsv
being different too. The distance matrices at /WorkingDirectory/Distances_SpeciesTree/
for the orthogroup in question were identical however.
To test if it was FastME resulting in different formatting of the gene trees, I ran one of the distance matrices located at: /WorkingDirectory/Distances_SpeciesTree/
two separate times (for an orthogroup that seems to be split up differently into HOGs between two identical runs). Oddly, it seems to be reproducible and I ran FastME as you appear to run it in the orthologues.py program, with options -N -w O -s
.
Differences for Phylogenetic_Hierarchical_Orthogroups/N0.tsv
between two replicate runs also appear when -M msa
is used.
Hope this is helpful and that it is just a matter of sorting some list or requesting not to change the gene tree format/order when reading and rewriting with proper names to get the deterministic behavior! Rauf
Hi David,
Thank you a ton for developing and maintaining OrthoFinder and various related software! They have been instrumental in my research.
Perhaps something you are already aware of, but I noticed that the second part of OrthoFinder does not seem reproducible like the first part is, up until OrthoGroups.tsv.
Between two replicate runs, I get identical results for OrthoGroups.tsv but Phylogenetic_Hierarchical_Orthogroups/N0.tsv appears to differ.
It appears differences are small between two replicate N0.tsv files so this is not a major issue but perhaps, if possible, something to resolve in later versions.
Rauf