n/p ratio clarification

I made a scpModelWorkflow() modeling of a small SingleCellExperiment object (I only have 20 cells). The scpModelFilterPlot() looks like this:

Rplot

I'm not surprised that I only have a few estimated features as I only have a few cells/observations. However, I'm puzzled by two things:

Why is the bar carresponding to features with a n/p ratio of 1 colored as "inestimable" ? According to the legend (and what I checked), features with a n/p ratio >= 1 are considered to be estimated.

How can I have features with a n/p ratio of 0? I thought that n could never be equal to 0 and checked that this was the case.

summary(sapply(metadata(sce)$model@scpModelFitList, "slot", "n"))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1.000   1.000   3.000   3.695   5.000  21.000

Indeed, n/p ratio is never less than 0.5

np <- 
sapply(metadata(sce)$model@scpModelFitList, "slot", "n") /
sapply(metadata(sce)$model@scpModelFitList, "slot", "p")
summary(np)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.5000  0.6667  1.1250     Inf     Inf     Inf

However, I was surpised to see that a large number of the n/p ratios were infinite, which means that p is equal 0.

summary(sapply(metadata(sce)$model@scpModelFitList, "slot", "p"))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   5.000   3.905   7.000  14.000

On further investigation, I found out that this happens whenever there are only 1 or 2 observations for a specific feature

p0 <- which(sapply(metadata(sce)$model@scpModelFitList, "slot", "p") == 0)

nobs_p0 <- rep(NA, length(p0))

for(i in seq_along(p0)) {
     nobs_p0[i] <- nrow(colData(sce)[!is.na(assay(sce)[p0[i], ]), ])
}

nobs <- rowSums(!is.na(assay(sce)))
obs_2 <- which(nobs <= 2)
table(obs_2 == p0)

TRUE 
3409

I assume that the 3409 features with an infinite n/p ratio are plotted as 0 in the plot. Why do the features with 2 observations always have a p equal 0? I suppose it's not that important since features with only 2 observations are not very informative in bigger datasets.

Hi Sam, Thanks for pointing out these inconsistencies.

Regarding your first point, I will fix this. The legend and docs are right, but the plot is misleading. It has to do with a wrong assignment of the edge cases when I cut the histograms into estimable and non-estimable features.
Regarding your second point, you did a great investigation job! Indeed, the issue you are raising lies within these lines: https://github.com/UCLouvain-CBIO/scp/blob/5e094c6d4d67b23b8ac5591257de45cefb38f3bd/R/ScpModel-Workflow.R#L213-L217 I intentionally did this, as IMHO, there is no use to model data with only 2 or less data points. Hence I generate an empty model matrix, hence p = 0, hence the feature is ignored. I'm open for discussion whether this would need a more clever management.

UCLouvain-CBIO / scp

n/p ratio clarification #64