def _infer_multiplets_from_observed(n_obs_multiplets, n_cells0, n_cells1):
""" Given a number of observed multiplets and cell counts for two transcriptomes,
infer the total number of multiplets (observed + unobserved) """
if n_cells0 == 0 or n_cells1 == 0:
return 0
# Prior probability of a doublet given counts for each cell type (ignore N_cells > 2)
p_obs_multiplet = 2 * (float(n_cells0) / float(n_cells0 + n_cells1)) * (float(n_cells1) / float(n_cells0 + n_cells1))
# Analytical MOM/MLE of binomial N given p, k
mle = float(n_obs_multiplets) / p_obs_multiplet
# In some (artificial) datasets, the mle can be higher than the total number of cells
# observed. This occurs when n_obs_multiplets > n_cells0|1. The right way to fix that would be to
# do inference in a full model that didn't fix some parameters. In practice, multigenomes
# are a rare analysis and most data isn't artificial, so we are implementing
# a small hack instead.
return min(mle, float(n_obs_multiplets + n_cells0 + n_cells1))
Sorry for my poor statistics knowledge, I just can not understand the process, would you please give me some detail explanation? Appreciate you!
Hi,I have a question about the theory of infer multiplets descripted in https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/analysis/multigenome.py
Sorry for my poor statistics knowledge, I just can not understand the process, would you please give me some detail explanation? Appreciate you!