Open samgregoire opened 1 month ago
Hi Sam, Thanks for pointing out these inconsistencies.
Regarding your first point, I will fix this. The legend and docs are right, but the plot is misleading. It has to do with a wrong assignment of the edge cases when I cut the histograms into estimable and non-estimable features.
Regarding your second point, you did a great investigation job! Indeed, the issue you are raising lies within these lines: https://github.com/UCLouvain-CBIO/scp/blob/5e094c6d4d67b23b8ac5591257de45cefb38f3bd/R/ScpModel-Workflow.R#L213-L217 I intentionally did this, as IMHO, there is no use to model data with only 2 or less data points. Hence I generate an empty model matrix, hence p = 0, hence the feature is ignored. I'm open for discussion whether this would need a more clever management.
I made a
scpModelWorkflow()
modeling of a smallSingleCellExperiment
object (I only have 20 cells). ThescpModelFilterPlot()
looks like this:I'm not surprised that I only have a few estimated features as I only have a few cells/observations. However, I'm puzzled by two things:
Why is the bar carresponding to features with a n/p ratio of 1 colored as "inestimable" ? According to the legend (and what I checked), features with a n/p ratio >= 1 are considered to be estimated.
How can I have features with a n/p ratio of 0? I thought that n could never be equal to 0 and checked that this was the case.
Indeed, n/p ratio is never less than 0.5
However, I was surpised to see that a large number of the n/p ratios were infinite, which means that p is equal 0.
On further investigation, I found out that this happens whenever there are only 1 or 2 observations for a specific feature
I assume that the 3409 features with an infinite n/p ratio are plotted as 0 in the plot. Why do the features with 2 observations always have a p equal 0? I suppose it's not that important since features with only 2 observations are not very informative in bigger datasets.