constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
253 stars 34 forks source link

autoEstCont always reduces estimation #55

Closed kjrgreen1 closed 3 years ago

kjrgreen1 commented 3 years ago

I have been using the estimateNonExpressingCells and calculateContaminationFraction to calculate the contamination fraction from nuclei by using mitochondrial genes. However, this occasionally gave very high contamination estimates. When the autoEstCont function was added, I began using that with the prior rho equal to the estimate from using mitochondrial genes. This seemed to solve the high estimate issue.

This was working well until I switched to using v1.4.5. Now, the autoEstCont function always lowers the estimated contamination, sometimes drastically. I would expect some samples to go up and some to go down after using the autoEstCont function. The reduction in the estimate also seems to scale linearly when adjusting the prior SD. Has anyone else had this issue?

constantAmateur commented 3 years ago

The default prior on the autoEst function is very broad and should only have a mild effect on the resulting contamination fraction. That said, it does expect that the contamination is probably in the 1-10% range, with the prior probability slowly tailing off at higher values. This is by design, contamination values above 20% are considered unusual and so we require strong evidence to set the value this high.

If you are confident that your contamination rates are higher than the default prior would indicate, then you should change the prior to match your experiment specific prior knowledge. You could also set a completely flat prior (e.g. set priorRhoStdDev=10) and let the data decide for you.