T-Wisse / MEP_Thomas

This repository serves as the documentation platform for my MEP in TU Delft.
1 stars 0 forks source link

Taking Hierarchy of GO terms into account #12

Open T-Wisse opened 3 years ago

T-Wisse commented 3 years ago

As pointed out by Enzo, GO terms have a hierarchical structure and as such are not independent. Rather than simply find GO terms common between the interacting genes, we should compare how many (what fraction) of the children of the GO terms of interactors are equal to the children of the GO term of the gene in the cluster. For now we will do this for the 1st layer children, but this may be extended later if need be.

As depicted in the figure below (made by Leila), this requires us to find all the 1st layer children of the query gene. For example: establishment or maintenance of cell polarity. These can be found with yeastmine (https://yeastmine.yeastgenome.org/yeastmine/template.do?name=GOTerm_GeneOrganism&scope=all). Then for every interactor we find we also find all 1st layer children of its associated GO terms. The overlap between the genes is then given by the fraction of all 1st layer children GO terms of the query gene that are also 1st layer children of the interactor gene. For the example in the figure this is 3/4.

As doing this manually for all or even a subset of genes would take an unreasonable amount of time we should implement this is python.

image

leilaicruz commented 3 years ago

great , please also share here your results!

T-Wisse commented 3 years ago

Update: I have a first version (commonGoForInteractors.py) of as far as I can tell working code to find the 1st layer children for the interactor genes and find which are in common with query 1st layer children (not all yet, this requires a small change). As expected, currently it takes ages to run. Probably because it fetches the children for every GO term for every interactor gene from yeastmine. This is only for a single query gene, so running this for a set of query genes is entirely unrealistic. The way around this is probably to save the 1st layer children for all relevant GO terms locally. This means we no longer need to access yeastmine for subsequent runs, reducing runtime considerably. Before doing so I will confirm that intermine indeed takes most of the time of the script.

leilaicruz commented 3 years ago