UCLouvain-CBIO / scp

Single cell proteomics data processing
https://uclouvain-cbio.github.io/scp/index.html
19 stars 2 forks source link

n/p ratio clarification #64

Open samgregoire opened 1 month ago

samgregoire commented 1 month ago

I made a scpModelWorkflow() modeling of a small SingleCellExperiment object (I only have 20 cells). The scpModelFilterPlot() looks like this:

Rplot

I'm not surprised that I only have a few estimated features as I only have a few cells/observations. However, I'm puzzled by two things:

summary(sapply(metadata(sce)$model@scpModelFitList, "slot", "p"))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   5.000   3.905   7.000  14.000

On further investigation, I found out that this happens whenever there are only 1 or 2 observations for a specific feature

p0 <- which(sapply(metadata(sce)$model@scpModelFitList, "slot", "p") == 0)

nobs_p0 <- rep(NA, length(p0))

for(i in seq_along(p0)) {
     nobs_p0[i] <- nrow(colData(sce)[!is.na(assay(sce)[p0[i], ]), ])
}

nobs <- rowSums(!is.na(assay(sce)))
obs_2 <- which(nobs <= 2)
table(obs_2 == p0)

TRUE 
3409 

I assume that the 3409 features with an infinite n/p ratio are plotted as 0 in the plot. Why do the features with 2 observations always have a p equal 0? I suppose it's not that important since features with only 2 observations are not very informative in bigger datasets.

cvanderaa commented 3 weeks ago

Hi Sam, Thanks for pointing out these inconsistencies.