Open ens-sb opened 1 year ago
Hi Botond
I had to check the code here to remind myself how it works. Basically, when a duplication is observed, OrthoFinder requires evidence that both child nodes correspond to the N0 species tree node in order to split it into two HOGs at the N0 level. Since the one child is just a single gene, it doesn't get split off.
My thoughts when I developed this was, I think, that this was a convenience to users -- they probably didn't really want single genes split off from orthogroups, that could have just ended up there due to tree inference inaccuracies, although technically if we believe the tree then we should be splitting. How do you feel about that, and is the behaviour a problem for your use case?
The "-y" option corresponds to a slightly different situation, but I think it might be appropriate here. I will be create a new, major-version release soon and could potentially include this case for the '-y' option.
Best wishes David
Hi David,
Thank you very much for the quick response!
Our use case is that we would like to parse out the pairwise orthology relationships from the HOG TSV files. For this indeed it would be less confusing if the orthogroup was split by the -y
option in the cases like the one discussed above so we do not have pairs of orthologous genes which have a duplication node as common ancestor in the gene tree.
It would be nice if you could include this in the next release of Orthofinder. I also would have a couple of other requests for that, I will submit them as separate issues for your consideration.
Many thanks, Botond
Hi David,
I have run OrhoFinder
2.5.4
on the example data with the-y
option and otherwise default parameters. The HOGs in the filePhylogenetic_Hierarchical_Orthogroups/N0.tsv
for the orthogroupOG0000002
look like this:In this output I have noticed that the gene tree parent clade of HOG
N0.HOG0000007
is a duplication node (n20
). Can you shed light on the reason why this HOG was not split by the-y
option.Many thanks, Botond