Missing genotypes？ - Githubissues

KChen-lab / Monopogen

SNV calling from single cell sequencing

GNU General Public License v3.0

85 stars 18 forks source link

Missing genotypes？ #34

Open wangjunyu-cumt opened 10 months ago

wangjunyu-cumt commented 10 months ago

We are particularly interested in calculating SNP (Single Nucleotide Polymorphism) expression levels using single-cell ATAC data. We have utilized the Monopogen tool to compute genotypes associated with germline mutations. However, we would like to determine whether the SNPs that were not computed are due to low read coverage or if they are consistent with the genomic genotype, making it impossible for us to make a determination.

jinzhuangdou commented 10 months ago

I am not sure I understand your question correctly. Monopogen only calls germline SNVs overlapped with 1KG3 panel and only 0/1 (ref/alt) and 1/1 (alt/alt) loci were output. The SNVs not computed would be: 1) not in 1KG3 panel; 2) genotype is 0/0; 3) no reads detected in the locus although included in the 1KG3 panel.

wangjunyu-cumt commented 9 months ago

I am not sure I understand your question correctly. Monopogen only calls germline SNVs overlapped with 1KG3 panel and only 0/1 (ref/alt) and 1/1 (alt/alt) loci were output. The SNVs not computed would be: 1) not in 1KG3 panel; 2) genotype is 0/0; 3) no reads detected in the locus although included in the 1KG3 panel.

Yes, but can I determine if the missing genotypes are 0/0 or simply not detected?

jinzhuangdou commented 7 months ago

If you do have some interesting loci, you can look at the sequencing coverage of that loci. If no reads covered, they would be missing.

tbrunetti commented 6 months ago

I have a follow up to this -- does that mean 0/0 should just be treated as missing if monopogen would never call a 0/0? I have way too many samples and SNPs to do a depth check on all of these. Also I have duplicate samples (same blood sample from same person) but we are getting a lot of discordant calls. For example, one sample might be 0/1 but the duplicate sample is 1/1.

Although I do believe most of our discordant SNPs are when in one duplicate sample we have a 0/0 and in another we have either a 0/1 or 1/1. Does that mean the 0/0 should just be replaced with the call made in the duplicate pair?

This trend is happening across all duplicate pairs we have and we have several. Any insight in interpretation or how to handle this? Also it is not the same set of discordant SNPs in different duplicate pairs. Thanks!

jinzhuangdou commented 6 months ago

Hi tbrunetti, you are right. Monopogen is not able to call 0/0 for single sample since it is hard to see this is a true 0/0 or due to sequencing coverage missing. When you mention the concordance for duplicates, I think you could not include 0/0 in the validation list since the sequencing coverage could be different even for duplicates. You can only consider 0/0 as true genotype if you have multiple samples for joint-calling (>50 single cell samples).

tbrunetti commented 6 months ago

So I have ~300 samples and I have them joint called in monopogen with the samtools mpileup. However, I don't think I see anything that has a missing call, which can't be real, so the only way to know if 0/0 is real is to do a depth check even if we joint call. Is that basically the idea? Thank you again for you insight, this is all incredibly helpful!

jinzhuangdou commented 6 months ago

Yes. You need to look at the sequencing depth to support 0/0 is true genotypes or missing (but imputed).