rank.logFC.detected Produces Dubious Rankings

MarioniLab / scran

Clone of the Bioconductor repository for the scran package.

https://bioconductor.org/packages/devel/bioc/html/scran.html

39 stars 23 forks source link

rank.logFC.detected Produces Dubious Rankings #116

Open DarioS opened 6 months ago

DarioS commented 6 months ago

There is already a lfc filtering parameter. Something useful missing is min.detected which would apply to self or other.

> orderDetected <- order(cluster1$rank.logFC.detected)
> cluster1[orderDetected[1:10], c(1:4, 9, 14, 19)]
DataFrame with 10 rows and 7 columns
        self.average other.average self.detected other.detected rank.logFC.cohen  rank.AUC rank.logFC.detected
           <numeric>     <numeric>     <numeric>      <numeric>        <integer> <integer>           <integer>
PAX7       0.0188465  0.0000877382     0.0115113   0.0000830314            13833     12014                   1
ECRG4      0.0542055  0.0015406061     0.0400046   0.0008963788            12084     10431                   1
NNMT       4.3730723  2.1111045671     0.9563483   0.6148204061                1         1                   1
MYF5       2.2181507  0.0330166809     0.5924322   0.0156052101                5         5                   1
RPS27L     4.2351264  2.9271708476     0.9731023   0.7666684260                1         1                   1
CALM2      4.5810915  3.3270106207     0.9658081   0.7676900297                2         2                   2
MLIP       0.0222212  0.0031465257     0.0178938   0.0033352642            14273     15185                   2
SOD2       3.9477973  2.6287379749     0.8990198   0.7366703714               19         5                   2
RARRES2    1.8958659  0.3565642800     0.4678596   0.1429341960                9        16                   2
MAG        0.0231304  0.0008944649     0.0153864   0.0008786545            13578     11628                   2

PAX7 is biologically a dubious marker gene if it only appears in 1.15% of cells of a cluster. MAG is another case.

LTLA commented 6 months ago

lfc doesn't actually do any filtering, but is instead a TREAT-like threshold for the calculation of p-values. Basically it's the mu in t.test. Nothing is explicitly filtered out when you set lfc, the shape of the DataFrame remains unchanged.

If you want an equivalent experience for the minimum detected proportion, you'd have to figure out what the null hypothesis becomes. I suppose we could just require a minimum absolute increase in the detected proportion, equivalent to bumping up the p for a one-sided binom.test (which is the analogous test for the detected proportions).

If you really just want to filter, you can do that outside of the function. It's a pretty complicated function already and I don't want to add more arguments, and also, I like keeping the DataFrame shape consistent across all clusters.

DarioS commented 6 months ago

a minimum absolute increase in the detected proportion

That sounds good.

LTLA commented 5 months ago

Update on this: while I think it's a good idea, I just don't have the time to work on it. I'd be happy to take a PR, though someone will have to delve into the C++ code to implement this change.

I'll also note that the proposed libscran-based replacement for scran will use the "delta detected" as one of its effect sizes for ranking, which is pretty much what is being proposed here, so you could just wait until that hits the shelves.

DarioS commented 5 months ago

libscran certainly looks worth waiting for! I haven not written C++ code in over twelve years, so best that I not meddle with it.