MaayanLab / enrichr_issues

5 stars 3 forks source link

Why cut GOBP library? #52

Closed janinubinu closed 1 year ago

janinubinu commented 1 year ago

Hi,

I am new to bioinformatics and have a question that I have been unable to find an answer to. Regarding the GOBP gene library that Enrichr provides, it is stated that the GOBP tree was cut at the 3rd/4th level and then all terms/genes downstream of the cut were included in Enrichr's GOBP library. My question is: why cut the tree? Is this to increase granularity of terms? Why is this superior over using the entire GOBP database?

Thanks

AviMaayan commented 1 year ago

Hi @janinubinu,

Enrichr needs gene set libraries to perform the search, so we must convert the tree into a library. If we convert each leaf into a set, the enrichment results become too fragmented and since many terms have only few genes, these can't be considered because you need enough genes to get statistical significance. Using the entire tree will results in too much redundancy and also when you have a lot of terms important overlapping sets may be missed. So cutting the tree at the right level is critical for optimal results. Also, note that this decision was made in 10 years ago, but since then the GO tree has grown, so other strategies might be more appropriate today.

Cheers,

Avi

janinubinu commented 1 year ago

Thank you for the explanation!