constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
249 stars 34 forks source link

autoEstCont -> plotMarkerDistribution? #102

Closed bbimber closed 2 years ago

bbimber commented 2 years ago

Hello,

Broadly speaking, I am trying to determine the best ways to visualize gene changes before/after running 10x data through your autoEstCont() vignette:

sc <- SoupX::load10X(rawCountDir)
sc <- SoupX::autoEstCont(sc, doPlot = T)

82 genes passed tf-idf cut-off and 39 soup quantile filter.  Taking the top 39.
Using 35 independent estimates of rho.
Estimated global rho of 0.01

However, when running this dataset through:

print(SoupX::plotMarkerDistribution(sc))

We get the error:

No gene lists provided, attempting to find and plot cluster marker genes.
Found 82 marker genes
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'print': Contamination fraction greater than 1 detected.  This is impossible and likely represents a failure in the estimation procedure used.

When we call autoEstCont(), code is run to identify non-expressed genes. autoEstCont() also calls setContaminationFraction(). When run through the autoEstCont codepath, a set of genes is selected, and setContaminationFraction() is passed a valid result. From the logging, we see it initially finds 82 potential genes, but this is filtered to 35.

When we dont supply nonExpressedGeneList, plotMarkerDistribution() also tries to guess the marker genes. Based on the error, it also starts with 82 genes. It doesnt appear to apply the secondary filter.

My questions are:

constantAmateur commented 2 years ago

The plotting functions do need a bit of updating and really work best when you manually specify the genes. I don't have the time to rewrite them at the moment, but it's on the todo list.

However, you should be able to extract the marker genes autoEstCont uses. They are stored in the fit object. So sc$fit$markersUsed has the list of genes used after all filtering.