drneavin / Demultiplexing_Doublet_Detecting_Docs

MIT License
15 stars 1 forks source link

Data and handling of pools #5

Open plger opened 2 years ago

plger commented 2 years ago

Hi drneavin, contacting you here since it's the only way I found ;) I read the Demuxafy preprint with much interest (very nice work), and had a question. One area of the field which I feel still needs improvement is how to best call doublets across captures, given that there are commonly technical differences, but that analyzing them together is bound to offer some advantage. This question wasn't really touched in your preprint, and from what I understand each pool gets run as if it was a single capture (you make in silico pools of 128 individuals that contain on average 123k droplets, but this couldn't occur in a single capture). I wonder if you investigated this question? The dataset you used, it seems to me, would be most appropriate to figure out how to best handle this. (I checked the eQTL paper but there's no processed data -- in case you'd be ready to share the count matrices and genotype calls ;) ) (I was also wondering why scDblFinder gets ranked suboptimally in terms of usability and such, but I won't overdo the insistent developer!) Thanks

drneavin commented 2 years ago

Hi @plger,

Sorry for the delayed response on this. Are you suggesting that aggregating multiple pools together and then detecting doublets of all the pools together would enhance doublet detection? I'm not sure if this is true for the demultiplexing softwares but possibly could be true for the doublet detecting softwares. Would the assumption be that with smaller number of droplets, there isn't a good representation of each cell type (especially rare and intermediate populations)?

The PBMC data should be available in the coming weeks as that publication will come out next week so keep your eyes peeled. But if you're interested in the processed fibroblast data, I can send you that directly.

The only reason that scDblFinder has two stars for useability (along with most of the other transcription-based softwares) is because solo is a command line tool and therefore is very easy to use but I'll note that scDblFinder scores better across the board than any of the other transcription-based methods.

You can contact me directly at d.neavin @ garvan.org.au if you would like me to send you the fibroblast data or to continue this discussion via email :)

Cheers, Drew