fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
559 stars 93 forks source link

Genotype (zygosity) query required for de novo analysis #356

Open prasundutta87 opened 2 years ago

prasundutta87 commented 2 years ago

Hi,

I am using ONT data (minimap2 aligned) for trio multisample calling and am using the snf method of sniffles v2.0.6. There are a few points I would like to put a few points forward -

1) I have seen cases where here is a missing genotype (./.) but there was enough coverage and a genotype quality was also present. What may be the reason for this? For ex.: ./.:18:21:4:Sniffles2.INS.156S0

2) There were cases where there is 0/0, and there are enough reads to support it (>10), but no genotype quality was present. It was NULL in the variant Id (as there was no variant there). Is genotype quality score not calculated in case of NULL variants? For example: 0/0:0:25:0:NULL

3) There were also cases where there was 0/0, but a genotype quality was present. For example: 0/0:1:4:1:Sniffles2.DEL.2638S0

Now for calling de novo variants in proband, I am looking for heterozygous SVs in proband and homozygous reference genotypes in the parents. If I put any genotype quality filters and missingness filters, I am loosing many variants due to the above conditions. I can obviously be lenient and change the missing genotypes to reference (due to point 1) and not take genotype quality score into consideration (due to point 2), but are these errors or known issues?

I am aware of certain coverage based rules (such as min support reads, and why some genotypes are missing, but the examples I am mentioning above do not fall into those). I am aware of --combine-null-min-coverage , but examples 1 and 3 contradict each other in that case. Also, is only DV considered for this? Or DV+DR?

Regards, Prasun

prasundutta87 commented 2 years ago

Any explaination on this? Or any update?

fritzsedlazeck commented 2 years ago

Hi there, @1: Thats interesting. We can take a closer look for sure. @smolkmo ?

@2: so 0/0 is always tricky. It can mean that the variant is absent (I think that's the case here) or just not reaching the level of heterozygosity (based on the ratio or supporting vs. reference reads)

@3: I think here is the case that the variant is observed in 1 read but that's not sufficient to call it heterozygous (0/1).

I don't think they contradict each other. The issue is that 0/0 can and are been used in both cases... that makes it hard and we agree. You can adjust the parameter for @1. We need to look into this if this is a bug or not.

Yes I would include DV measurements for that.. We are working to improve this but as you know it is not easy. Thanks Fritz

prasundutta87 commented 2 years ago

Thanks a lot for the replies, @fritzsedlazeck !

It will be interesting to know what's hapenning for @1.

For @2, my question was more tilted towards genotype quality because in my pipeline to call de novo SVs from a trio VCF file, I am relying on high GQ and also performing filtering based on, say, 20 which is a good threhold for SNV analysis. Should I still be dependent on it for SV analysis? For overall SV quality in the trio (QUAL), I am relying on the default filtering parameters what sniffles applies. I am working with data with median depth 10-32x. Is it still practical to use GQ filtering in this case. I am aware that there is no striaght answer but what has been your experience with GQ for SV filtering?

Regarding @1, are you pointing towards using --combine-null-min-coverage? I thought since its overall converage, it should contain all reads irrestpective of them being DV or DR.

Apologies for not being clear about my contradiction point. I meant in @1 there was enough depth (21+4), still it got a missing genotype but @3 had 1+3 reads, it has a genotype assigned. But, as you mentioned before, it will be good to know whats going on with @1.

Regards, Prasun