MEGA-GO / MegaGO

Calculate semantic distance for sets of Gene Ontology terms
MIT License
5 stars 2 forks source link

Interpretation problem when comparing identical high-level terms #32

Open tivdnbos opened 4 years ago

tivdnbos commented 4 years ago

When identical high-level terms are compared, a low score is returned, e.g.: GO:0030170 (pyridoxal phosphate binding) vs GO:0030170 gives 99% similarity GO:0043167 (ion binding) vs GO:0043167 gives 55% similarity GO:0003674 (molecular function) vs GO:0003674 gives 0% similarity

I also tested what happens if that term is multiple times in the list (e.g. 10x GO:0043167 vs 1x GO:0043167) but this gives the same result, 55% in this case

rababerladuseladim commented 4 years ago

According to the authors, the simrel method is aimed at comparing gene products rather than functional profiles. Thus, generic terms are penalized: “Generic terms do not have a high relevance for the comparison of the exact function of different gene products.” In my opinion, this does not make sense for comparing profiles. The simrel method without the penalty becomes the simLin method.

tivdnbos commented 4 years ago

I suggest to make a different branch where we test it with simLin. What do you think @rababerladuseladim @pverscha ?

rababerladuseladim commented 4 years ago

I redid the analysis with the simLin metrik, to be found here: https://github.com/MEGA-GO/manuscript-data-analysis/tree/use_lin_metric Sample clustering is not affected, the ranges for the similarity change a bit towards higher levels.