YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
967 stars 246 forks source link

Count duplicate K Numbers (KEGG) for enrichKEGG? #681

Open diefuechsin opened 3 months ago

diefuechsin commented 3 months ago

Hi!

I am doing an analysis with clusterProfiler. I am wondering if it would make sense to let enrichKEGG count duplicate k numbers (KEGG numbers such as "K02760") for pathway enrichment analysis?

In my dataset, I have annotated coding sequences to K numbers and some K numbers appear in duplicate/multiple times. Would it be beneficial for the outcome of an over-representation study to also count the duplicates/multiples for GeneRatio and statistics?

I observed, for example, K02760 (assigned to pathway ko02060) was annotated to two different coding sequences. However, K02760 seems to be counted only once by enrichKEGG for the statistics and GeneRatio for this pathway.

Thanks to your comment/help in advance! Best regards