constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

too few genes that passed tf-idf cutoff #137

Open wbvguo opened 1 year ago

wbvguo commented 1 year ago

Hi, thanks for maintaining this tool, I recently run it with some of my samples and found a particular sample that has too few genes that passed tf-idf cutoff, the following is my sample information and the warning message

sample type: T cell subpopulations, sample_info:

Estimated Number of Cells 8,449
Fraction Reads in Cells 88.3%
Mean Reads per Cell 15,520
Median UMI Counts per Cell 5,041
Median Genes per Cell 2,386
Total Genes Detected 25,653

warning message

> sc = autoEstCont(sc, doPlot=FALSE)
17 genes passed tf-idf cut-off and 1 soup quantile filter.  Taking the top 1.
Using 0 independent estimates of rho.

Warning messages:
1: In autoEstCont(sc, doPlot = FALSE) :
  Fewer than 10 marker genes found.  Is this channel low complexity (see help)?  If not, consider reducing tfidfMin or soupQuantile
2: In estimateNonExpressingCells(sc, tmp, maximumContamination = max(contaminationRange),  :
  No non-expressing cells identified.  Consider setting clusters=FALSE, increasing maximumContamination and/or FDR
3: In autoEstCont(sc, doPlot = FALSE) :
  Fewer than 10 independent estimates, rho estimation is likely to be unstable.  Consider reducing tfidfMin or increasing SoupMin.
4: In min(x) : no non-missing arguments to min; returning Inf
5: In max(x) : no non-missing arguments to max; returning -Inf

I try to adjust the tfidfMin to something lower, but the estimated fraction of contamination increases rapidly:

tdidfMin #genes_passed_tdidf #indepedent estimates rho
1.00 17 0 -
0.95 22 0 -
0.90 27 1 0.29
0.85 38 5 0.55
0.80 43 5 0.55
0.75 58 14 0.80

I have a sense this might be due to this sample's sequencing depth being relatively low, but could you provide some explanation from this tool's perspective about why this phenomenon should happen? And do you have any suggestions on how to deal with such a problem, e.g. should we proceed with decontamination when the estimated rho is big (greater than 0.5)?

Thanks!