YuLab-SMU / DOSE

:mask: Disease Ontology Semantic and Enrichment analysis
https://yulab-smu.top/biomedical-knowledge-mining-book/
116 stars 36 forks source link

Sort decreasing only for DOSE YuLab-SMU/DOSE#58 #59

Open Nelson-Gon opened 2 years ago

Nelson-Gon commented 2 years ago

Hi,

This removes the need for a decreasing only sort if not using DOSE. In is.sorted decreasing was set to TRUE which also affected fgsea.

Thank you, NelsonGon

huerqiang commented 2 years ago

I don't think it's necessary to modify it. (1) When the input order of the genelist changes, the NES and pvalue/p.adjust/qvalues of the enrichment result keep still, except for rank and leading_edge. The rank and leading_edge are calculated by the order of the input genelist, so if you want enable GSEA to support input in any order, these code should also be modified to ensure that the results are completely consistent. (2) Maybe an easier way is to sort the user's input directly, and then do the following analysis.

Nelson-Gon commented 2 years ago

I don't think it's necessary to modify it. (1) When the input order of the genelist changes, the NES and pvalue/p.adjust/qvalues of the enrichment result keep still, except for rank and leading_edge. The rank and leading_edge are calculated by the order of the input genelist, so if you want enable GSEA to support input in any order, these code should also be modified to ensure that the results are completely consistent. (2) Maybe an easier way is to sort the user's input directly, and then do the following analysis.

I am using this with user sorted data. The data is sorted in ascending order but GSEA using fgsea won't run because of the "should be sorted in decreasing order only" error. Changing this part of the code has no effect on rank and leading_edge because we are only allowing a user to provide a geneList sorted in any order. What happens downstream of the input data has remained intact so I think this shouldn't result in any inconsistencies.

The idea behind is.Sorted seems to check if what we have provided as a geneList is sorted in descending order regardless of gsea method. The PR only avoids this and does not affect anything else.

DOSE:::is.sorted
function (x, decreasing = TRUE) 
{
    all(sort(x, decreasing = decreasing) == x)
}
huerqiang commented 2 years ago

If the order of input is randomly sorted, the result(rank and leading_edge) will be affected: test_gsea.pdf

Nelson-Gon commented 2 years ago

If the order of input is randomly sorted, the result(rank and leading_edge) will be affected: test_gsea.pdf

Thanks, I've only been looking at it visually and it seemed the graph would be a mirror image of that of the descending sort. We would expect genes at the left in a descending sort to show up on the right instead. From the output (e.g. gsedo2), it seems this is the case (genes move one step further in the ranking). Only the ranking seems affected and not the leading edge as might be expected (gsedo vs gsedo2, not sorting randomly. Only ascending vs descending)

huerqiang commented 2 years ago

Your submission will bring hidden dangers, as I mentioned earlier.

Nelson-Gon commented 2 years ago

Hi @huerqiang

I have taken a look at what fgsea does and it seems the order does in fact not matter in which case the sorting may be irrelevant.

From https://github.com/ctlab/fgsea/issues/96 and other related issues though, it seems that changing sort to anything else would have resulted in similar results here so not sure what exactly causes the differences in output as you noted earlier.

huerqiang commented 2 years ago

@Nelson-Gon “The rank and leading_edge are calculated by the order of the input genelist, so if you want enable GSEA to support input in any order, these code should also be modified to ensure that the results are completely consistent.” https://github.com/YuLab-SMU/DOSE/blob/master/R/gsea.R#L356-L427