BaderLab / EnrichmentMapApp

The EnrichmentMap Cytoscape App allows you to visualize the results of gene-set enrichment as a network.
http://apps.cytoscape.org/apps/enrichmentmap
GNU Lesser General Public License v2.1
31 stars 12 forks source link

FGSEA-service: plumber rounds q-values, affects filtering #515

Closed mikekucera closed 1 year ago

mikekucera commented 1 year ago

Not sure if this is an issue or not, but its worth documenting and having a discussion.

I was comparing the results of running the EM-web data pipeline vs. doing the same analysis in EM-desktop. I thought this would be a good test to make sure everything is working as expected. That's when I ran into a discrepancy between the networks produced by each app.

Heres what I did:

Here is what one of the extra pathways looks like when outputting to an TSV file from R:

RETINOBLASTOMA GENE IN CANCER%WIKIPATHWAYS_20220510%WP2446%HOMO SAPIENS 66 0.00413467655736618 0.0500321867696385 -1.58091906592934

Here's what it looks like in JSON format when output by plumber (padj is the q-value):

    {
      "pathway": "RETINOBLASTOMA GENE IN CANCER%WIKIPATHWAYS_20220510%WP2446%HOMO SAPIENS",
      "size": 66,
      "pval": 0.0041,
      "padj": 0.05,
      "ES": -0.4452,
      "NES": -1.5809
    },

Whats happening is Plumber is automatically rounding the values to 4 decimal places. The q-value (padj) was 0.0500321867696385, which is greater than 0.05 so it gets removed in EM-desktop, but plumber rounds it down to 0.05 which causes it to pass the filter in the EM-service.

So... Is this a bug? Out of ~6000 pathways this happens 3 times.

If we disable the rounding feature in pumber's Json serializer it will result in a significant increase in the size of the data passed between the services. Is it worth it?

Should we do the q-value filtering on the R side? Why even send the pathways to the EM-service if they're just going to get filtered out anyway?

Note, we can remove the ES field from the output, its not being used by anything. Arguably we could also remove pval, but I'm worried we might need it for something in the future.

risserlin commented 1 year ago

the behaviour is explained as you went through above. I don't think it is a bug and only applies to fringe cases. There is more variability with permutations in GSEA.

Don't remove p-value. If weaker/noisier datasets don't have sets that pass the q-value 0.05 threshold it is recommended to ignore the q-value and use the p-value instead.

mikekucera commented 1 year ago

Hi Ruth, Thanks for the answer. I moved this issue to the EM-web repository so that the other developers will see it: https://github.com/cytoscape/enrichment-map-webapp/issues/120