HelenaLC / muscat

Multi-sample multi-group scRNA-seq analysis tools
160 stars 32 forks source link

p_adj.loc greater than p_adj.glb #96

Closed malosreet closed 2 years ago

malosreet commented 2 years ago

Hello!

I have noticed that the p_adj.loc is sometimes greater than the p_adj.glb in my differential expression results (using pbDS on an snRNA-seq dataset, with the edgeR method),

To my understanding the p_adj.loc is produced by correcting for multiple testing across all genes tested within a cluster while the p_adj.glb is produced by correcting for multiple testing across all genes tested in all clusters. Thus, p_adj.loc should be less than p_adj.glb.

Here is a small section of the table produced by resDS as an example (rows 3, 7, 16, 18, and 19):

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | gene | cluster_id | logFC | logCPM | F | p_val | p_adj.loc | p_adj.glb | contrast -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 1 | FO538757.2 | Inhib_5 | 0.297907 | 6.55732 | 0.851136 | 0.361205 | 0.786318 | 0.804342 | ei.group_idS-ei.group_idC 2 | RP5-857K21.4 | Inhib_5 | -0.45668 | 6.879999 | 1.461924 | 0.233008 | 0.685881 | 0.715474 | ei.group_idS-ei.group_idC 3 | HES4 | Inhib_5 | -0.14578 | 5.831962 | 0.146945 | 0.703297 | 0.946466 | 0.94204 | ei.group_idS-ei.group_idC 4 | SCNN1D | Inhib_5 | 0.393927 | 5.304127 | 0.538035 | 0.467095 | 0.849796 | 0.858966 | ei.group_idS-ei.group_idC 5 | ACAP3 | Inhib_5 | 1.019394 | 6.236945 | 9.745041 | 0.003155 | 0.143755 | 0.196263 | ei.group_idS-ei.group_idC 6 | ATAD3B | Inhib_5 | 0.690168 | 5.472104 | 2.170819 | 0.147689 | 0.589724 | 0.629304 | ei.group_idS-ei.group_idC 7 | MIB2 | Inhib_5 | -0.16557 | 5.665529 | 0.11623 | 0.734764 | 0.956805 | 0.950395 | ei.group_idS-ei.group_idC 8 | SLC35E2B | Inhib_5 | 0.572459 | 5.423179 | 2.145233 | 0.150042 | 0.590007 | 0.632773 | ei.group_idS-ei.group_idC 9 | GNB1 | Inhib_5 | -0.21795 | 7.569159 | 1.530005 | 0.222599 | 0.678797 | 0.706937 | ei.group_idS-ei.group_idC 10 | GABRD | Inhib_5 | -0.51315 | 6.940436 | 5.345291 | 0.025464 | 0.321897 | 0.369583 | ei.group_idS-ei.group_idC 11 | PRKCZ | Inhib_5 | 0.349367 | 6.707143 | 2.446721 | 0.124854 | 0.561587 | 0.600813 | ei.group_idS-ei.group_idC 12 | SKI | Inhib_5 | 0.226541 | 6.517301 | 0.675613 | 0.415483 | 0.822396 | 0.833289 | ei.group_idS-ei.group_idC 13 | PLCH2 | Inhib_5 | 0.47294 | 6.617706 | 4.095984 | 0.049014 | 0.408241 | 0.452765 | ei.group_idS-ei.group_idC 14 | CEP104 | Inhib_5 | 0.295637 | 6.111008 | 0.92458 | 0.34147 | 0.770297 | 0.792691 | ei.group_idS-ei.group_idC 15 | DFFB | Inhib_5 | 0.555827 | 5.683095 | 2.5341 | 0.118489 | 0.557515 | 0.590573 | ei.group_idS-ei.group_idC 16 | AJAP1 | Inhib_5 | 0.062656 | 7.534503 | 0.086387 | 0.770188 | 0.965444 | 0.959339 | ei.group_idS-ei.group_idC 17 | NPHP4 | Inhib_5 | 0.614426 | 6.15494 | 4.304073 | 0.043834 | 0.394343 | 0.437253 | ei.group_idS-ei.group_idC 18 | KCNAB2 | Inhib_5 | 0.037785 | 7.168045 | 0.032431 | 0.857903 | 0.986452 | 0.97939 | ei.group_idS-ei.group_idC 19 | CHD5 | Inhib_5 | 0.050941 | 7.044387 | 0.046852 | 0.829623 | 0.979776 | 0.972831 | ei.group_idS-ei.group_idC

I am using version 1.8.0 of muscat.

I would appreciate any information you can provide to help understand this discrepancy.

Thank you!

May be related to #69

plger commented 2 years ago

Hi, while you are correct about the meaning of the global/local distinction, and that this will tend to lead to lower local adjusted p-values, in fact this is not always the case. Consider the following example:

> set.seed(123)
> pvals <- c(0.01,10^-abs(rnorm(9))) # random p-values, with 0.01 at the beginning
> p.adjust(pvals,method="BH")
 [1] 0.09208109 0.45853565 0.73575382 0.09208109 0.85014227 0.82503003
 [7] 0.09208109 0.49429447 0.13579343 0.41131746
> p.adjust(c(0.01,pvals),method="BH")
 [1] 0.05500000 0.05500000 0.43233361 0.71940373 0.07596690 0.85014227
 [7] 0.81677973 0.07066534 0.47575843 0.11949822 0.37704100

Notice that in the second case, I add another 0.01 at the beginning, and that this leads to a decrease in the adjusted p-value of the existing 0.01 (from ~0.09 to 0.055; others are also changed a bit). The reason for this is that the correction method works under the assumption that a certain proportion of low p-values are expected under the null hypothesis, and looks for true signal in an excess of such p-values. Hence the presence of other such similar p-values decreases the probability that any of them is due to chance.

malosreet commented 2 years ago

Thank you @plger! This makes sense.