10XGenomics / cellranger

10x Genomics Single Cell Analysis
https://www.10xgenomics.com/support/software/cell-ranger
Other
355 stars 92 forks source link

Theory of infer multiplets #192

Closed Bio-MingChen closed 11 months ago

Bio-MingChen commented 1 year ago

Hi,I have a question about the theory of infer multiplets descripted in https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/analysis/multigenome.py

 def _infer_multiplets_from_observed(n_obs_multiplets, n_cells0, n_cells1):
        """ Given a number of observed multiplets and cell counts for two transcriptomes,
        infer the total number of multiplets (observed + unobserved) """

        if n_cells0 == 0 or n_cells1 == 0:
            return 0

        # Prior probability of a doublet given counts for each cell type (ignore N_cells > 2)
        p_obs_multiplet = 2 * (float(n_cells0) / float(n_cells0 + n_cells1)) * (float(n_cells1) / float(n_cells0 + n_cells1))

        # Analytical MOM/MLE of binomial N given p, k
        mle = float(n_obs_multiplets) / p_obs_multiplet
        # In some (artificial) datasets, the mle can be higher than the total number of cells
        # observed.  This occurs when n_obs_multiplets > n_cells0|1. The right way to fix that would be to
        # do inference in a full model that didn't fix some parameters.  In practice, multigenomes
        # are a rare analysis and most data isn't artificial, so we are implementing
        # a small hack instead.
        return min(mle, float(n_obs_multiplets + n_cells0 + n_cells1))

Sorry for my poor statistics knowledge, I just can not understand the process, would you please give me some detail explanation? Appreciate you!