davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
673 stars 186 forks source link

Nomenclature of HOG #452

Open MrTomRod opened 4 years ago

MrTomRod commented 4 years ago

Hey David

It just came to me that there might be a better nomenclature for the HOGs, since they are hierarchical.

Would it not be possible to have a EC-numbers-like nomenclature?

Suppose gene X is in N0.HOG00, N1.HOG02,N2.HOG01 andN3.HOG04.

We could now give it the orthogroup-annotation HOG.0, HOG.0.2, HOG.0.2.1 and HOG.0.2.1.4.

N0.HOG00 N1.HOG02 N2.HOG01 N3.HOG04
HOG .0 .2 .1 .4

Not sure if there would be an advantage to this yet, other than by looking at orthogroup-annotation HOG.0.2.1.4, we know all its genes are also in HOG.0.

A disadvantage would be that the orthogroup-annotations become long and ugly.

Best, MrTomRod

davidemms commented 3 years ago

Hi MrTomRod

Yes, I think this is a nice idea. I've thought about it a bit and I think you identify the main pros and cons. My reason not to originally was so that the new files would be a relatively clean drop-in replacement for the original orthogroups file, but the method you suggest does have advantages for tracing the hierarchical nature of the groups. I wonder if a translation file, or an extra column in the HOG file so that both names are given might be the best way forward. I'll have to think it through.

All the best David