YuLab-SMU / ProjectYulab

:next_track_button: Small coding tasks that enable you to participate in our development
33 stars 3 forks source link

GO enrichment analysis on proteomics data #12

Open GuangchuangYu opened 1 year ago

GuangchuangYu commented 1 year ago

Most enrichment analysis tools are designed at the gene level and only support gene IDs.

The first step to performing GO analysis for proteomics data is to convert protein IDs to gene IDs and then analyze them at the gene level. This can have some harmful impacts on the results.

There is a Gene Ontology Annotation (GOA) Database (https://www.ebi.ac.uk/GOA/index) maintained by EBI, which provides GO annotation to the Uniprot database. If we can parse the GOA file (the format was defined in http://geneontology.org/docs/go-annotation-file-gaf-format-2.2/) to a suitable format, it is easy to perform GO enrichment analysis at the protein level with the universal interface provided by clusterProfiler.

Fortunately, the clusterProfiler package provides read.gaf() function to parse the GOA file, and the output can be directly used in enricher() and GSEA().

Here, I would ask you to compare GO enrichment results obtained at the gene level (indirectly, by converting protein IDs to gene IDs) and at the protein level (directly, by using GOA file as GO annotation).

You can use the merge_result() function to merge the results and use dotplot() to compare the results visually.

wangzy20 commented 1 year ago

intersting task, I'm trying for it