DavisLaboratory / singscore

An R/Bioconductor package that implements a single-sample molecular phenotyping approach
https://davislaboratory.github.io/singscore/
40 stars 5 forks source link

Breaking ties during ranking and theoretical minimum rank #16

Closed messersc closed 4 years ago

messersc commented 4 years ago

Hi,

singscore recommends to break ties during ranking with min instead of e.g. averaging or randomly. I had some problematic samples where the expression for most genes was 0, so when computing scores for these samples the mean rank of the genes in a set could be smaller than the computed theoretical minimum, leading to scores < 0 (no centering).

I get that this is an edge case, but it does not seem to be discussed in the 2 papers on singscore. Is there anything better than removing a) genes that are not expressed in most samples of a cohort and b) samples that have most genes not expressed?

Thanks, Clemens

bhuvad commented 4 years ago

Hi Clemens,

The assumption for the theoretical bounds is that all ranks are unique, which is clearly not the case when you are dealing with RNA-seq data which has many zeros. As you mentioned, using min when ranking genes provides negative scores (without centring) though this is due to a different set of theoretical bounds. In a realistic application, the impact of this will be negligible and should not affect your analysis greatly.

Regarding zero measurements across most samples, we do mention in the singscore paper that genes with low expression across most samples should be filtered out prior to the application of our method. This filtering is justified in many expression-based analyses, including DE analysis.

As for your second question, I am assuming you are attempting to apply singscore to single-cell RNA-seq data? This would be the major situation where individual samples would be enriched in zeros. If this is the case, we won't be able to provide much guidance. We are working on the potential application of singscore to scRNA-seq data though it will be a while before we can begin making recommendations.

I hope this information helps.

Cheers, Dharmesh