Understanding ORA analysis of differentially expressed CCIs

alvarezprado commented 5 months ago

Dear authors,

Thank you for developing scDiffComm, it's a tremendously useful tool. I have carefully read your Nat Aging paper, but I don't fully understand some of the parameters reported in the ORA analyses. I thought that posting the question here would also help future users to find the information, but please feel free to remove the issue and move the discussion to e-mail or other means if you consider it more appropriate.

If I understood correctly, a ORA score is calculated for each pathway for all differentially expressed CCIs (UP, DOWN, FLAT) and then an adjusted p-value is reported (BH_P_VALUE_UP, BH_P_VALUE_DOWN, BH_P_VALUE_FLAT) for the enrichment. But there is also an equivalent for "DIFF" (BH_P_VALUE_DIFF, ORA_SCORE_DIFF). My questions are:

1. How should we interpret pathways for which both UP and DOWN adjusted p-values are very low? This could make sense for generic pathways where we could think about positive and negative regulators, but interpretation gets harder for pathways like "Positive regulation of (X process)". Could we assume that the lower adj. p-value prevails here to consider a pathway up or downregulated based on differential CCIs?

2. What are ORA_SCORE_DIFF and BH_P_VALUE_DIFF measuring? How should we interpret these values? I assume they tell us something about the direction of the change, but I'm unsure about their meaning.

Thanks a lot for your help and my apologies if I missed key information when reading your paper!

CyrilLagger commented 5 months ago

Hi,

Thank you for the feedback and the questions! Sure, let's keep the discussion here as it might serve others.

1. How should we interpret pathways for which both UP and DOWN adjusted p-values are very low?

ORA results (for pathways, GO terms, or any relevant terms) with both UP and DOWN significant p-values can appear if the given term/pathway is associated with both up and down-regulated CCIs in a dataset. This is biologically possible and can happen for different relevant reasons. Knowing exactly what is happening requires looking at those CCIs in more detail, though (do not forget that over-representation only provides coarse information across the entire dataset and loses fine-scale information).

Let's look at a concrete example to understand how this can happen. Let's consider the aging Brain Female dataset from our Atlas scAgeCom. There are 11 cell types and scDiffCom detected 10’275 CCIs. The GO term G protein-coupled receptor signaling pathway appears to be both UP and DOWN over-represented. We can understand why if we compare the volcano plot of all 10'275 CCIs with the volcano plot of only the CCIs associated with this specific GO term.

All CCIs: brain_all_ccis

CCIs associated with the GO term: brain_go_g_protein

We can see that there is a significant number of both up-regulated (red) and down-regulated (blue) CCIs associated with the GO term. If we look even more in detail in the (truncated) tables below, we can investigate the LRIs and cell types of those CCIs:

cci_go_term_up cci_go_term_down

The up-regulated CCIs are mostly emitted from macrophages/microglia to astrocytes/microglia with specific LRIs. In contrast, the down-regulated ones are mostly emitted from endothelial cells via different genes. Still, in both cases, the genes belong to the same specific pathway we are considering here.

In summary, it is possible for a term/pathway to be over-represented in both up and down-regulated CCIs in the same dataset/tissue because different cell types/genes will regulate this pathway differently.

2. What are ORA_SCORE_DIFF and BH_P_VALUE_DIFF measuring?

These are values we implemented early on in the package but rarely used. The naming ("DIFF") is actually not very clear.

The associated question is: "Is the pathway over-represented in the CCIs that changes (either up or down, either blue or red above), compared to all other CCIs in the dataset?". For comparison, ORA_SCORE_UP would correspond to: "Is the pathway over-represented in the CCIs that are up-regulated (only red above), compared to all other CCIs in the dataset?"

So actually, these values don't tell us anything about the direction of the change. This is the opposite, it is just telling that the pathway is changing in either direction. Since this is generally less informative than ORA_SCORE_UP and ORA_SCORE_DOWN, this is probably of limited use. We might still imagine some cases where BH_P_VALUE_UP and BH_P_VALUE_DOWN are both not significant (large) but where BH_P_VALUE_DIFF is significant. That would indicate a pathway neither strongly up or down-regulated but just dysregulated (going somehow up and down depending on the cells/genes).

Please let me know if that clarifies things or if you have more questions!

alvarezprado commented 5 months ago

Hi Cyril,

Many thanks for the detailed response, now everything is crystal clear!

I'll close the thread now, I'm sure it will be very helpful for future users as well :)

CyrilLagger / scDiffCom

Understanding ORA analysis of differentially expressed CCIs #20