Model the fractions of visible and invisible doublets in case of multiplexing

drneavin / Demultiplexing_Doublet_Detecting_Docs

MIT License

13 stars 1 forks source link

Model the fractions of visible and invisible doublets in case of multiplexing #15

Open vertesy opened 1 year ago

vertesy commented 1 year ago

Hey,

on https://demultiplexing-doublet-detecting-docs.readthedocs.io/en/latest/test.html you write:

Please let us know if there are additional features that you think will help make this tool useful.

Not clear where I am supposed to do that, so I do it here.

The tool is very nice, but what I think would be a very useful upgrade is to

Model the fractions of visible and invisible doublets in case of multiplexing

E.g.: invisible home doublet (sample barcodes: sbc-1 + sbc-1) or visible hetero doublet (sbc-1 + sbc-2). Let it be genotype muxing or sample barcode muxing (sbc).

The gist is:

If you mix at 50-50 ratio, your demuxer algorithm will find much more doublets than
if you mix at 90-10 ratio.

I wrote a function for this task in Seurat.utils, you could base on that, but its essentially just multiplying a couple of fractions!

drneavin commented 1 year ago

Hi @vertesy !

Thanks for your comment! Sorry for my delayed response - I've been mulling this over for the last day.

I agree this could be very helpful but I think this can really only be implemented for genotype-based demultiplexing doublet annotations. I had considered something like this at one point but decided against it because the doublets detectable by the transcription-based doublet detecting softwares are a function of how heterogeneous the cell types being analyzed are and I think this is a bit hard to estimate a prior.

But your point is well taken and I think I could implement something like this and make it clear that it is only applicable for the genotype-based doublet detections. I would probably try to add it to our doublet calculator and also as a function as part of the package. I think it would be most helpful for users to be able to determine this early on so they know what results to anticipate when they run the tools.

Let me know what you think.

vertesy commented 1 year ago

Hey, sure, your call what you think is relevant for your audience.

Just to be precise, what I suggest is applicable for at least 3 ways of multiplexing and it gives the upper estimate at perfect labeling for:

Genotype,
Antibody (the original Cell Hashing paper), and
Lipid based (Multi-seq paper & 10X Cell Multiplexing ) demux.

cheers!

vertesy commented 1 year ago

On a similar note, all calculations seem to be legit for the original 10X chip.

Now with the new 10x high throughput chip, the doublet rate is ~halved.

You could make a drop down selector and swap the graph on the left - the graph is very nice to see btw!

ps. I realized, they also have a third, explicitly low thr setup.