arogozhnikov / demuxalot

Reliable, scalable, efficient demultiplexing for single-cell RNA sequencing
MIT License
23 stars 3 forks source link

Questions about providing genotypes to demuxalot #16

Open cartographerJ opened 3 years ago

cartographerJ commented 3 years ago

Cool package! Does this package work if I use Vartrix or cellsnp-lite or something along those lines to create the genotypes.vcf from common snps or does it require orthogonal and better variant calling from WES/WGS/snp arrays?

arogozhnikov commented 3 years ago

Hi @cartographerJ,

demuxalot does not require common snps, so vartrix and cellsnp-lite won't contribute

genotypes.vcf should be inferred from any variant calling, not necessarily precise (WES/WGS/snp arrays - all good, but also shallow rnaseq or shallow WGS are also fine). Basically, you need to provide some initial information about every genotype

BiotechPedro commented 1 year ago

Hello @arogozhnikov !

Is it possible to provide that information from the .vcf generated by cellsnp-lite mode 2b? Also, I am wondering if cellsnp-lite + vireo could be more efficient than demuxalot since as far as I understand it, you didn't prove that in the preprint.

Regarding the preprint, are you on the road to publish it?

Thank you!!

Pedro

arogozhnikov commented 1 year ago

From description they seem to provide VCF, so probably yes, but I can't be sure, as VCFs internally can be quite different. If VCF has calls (e.g. 0/1), then demuxalot should be able to use those VCFs.

Regarding the preprint, are you on the road to publish it?

I'm not a part of Herophilus, can't comment their plans. From my side I don't make any steps to get it published.

I am wondering if cellsnp-lite + vireo could be more efficient than demuxalot

Didn't check, but I really doubt it. CellSNP was fantastically slow, but vireo itself wasn't quick either (basically, you'll need to make some good guess during calling to leave not too few, and not too many SNPs).

AFAIR, vireo had a number of issues with doublets when increasing number of genotypes, so question of speed becomes secondary (at least to me).

BiotechPedro commented 1 year ago

Sorry, I meant accuracy instead of efficiency. But, yes, still the keypoints are the doublets. It's a pity the publication was stoped. I'd like to see what reviewers would suggest for completing the manuscript.

Anyway, thanks for your answer!!