fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
545 stars 91 forks source link

How does Sniffles genotype the variants #118

Closed biozzq closed 5 years ago

biozzq commented 5 years ago

Dear all,

I simulated a small data set to learn how to use Sniffles. This data contains a heterzygous deletion. However, after ran following command, the genotype for this variant is 1/1. In my mind, it should be 0/1.

sniffles (Version: 1.0.10) -m query.bam --min_seq_size 1 -s 1 -v query.vcf --skip_parameter_estimation --genotype

1 535 0 N <DEL> . PASS PRECISE;SVMETHOD=Snifflesv1.0.10;CHR2=1;END=1639;STD_quant_start=0.000000;STD_quant_stop=0.000000;Kurtosis_quant_start=-nan;Kurtosis_quant_stop=-nan;SVTYPE=DEL;SUPTYPE=SR;SVLEN=-1104;STRANDS=+-;RE=1;REF_strand=0,0;AF=1 GT:DR:DV 1/1:0:1

image

Best wishes, Zhuqing

fritzsedlazeck commented 5 years ago

Dear Zhuqing, The genotype of a variant is determined by the number of reads that support the reference or alternativ allele. In your case there seems to be only 1 read, which is also shown by the screenshot. Thus, the report from Sniffles is expected. Thanks for reaching out. Cheers Fritz

biozzq commented 5 years ago

Dear Fritz, Thank you. However, I think the right genotype here should be 0/1. There are two reads here, one supports the reference allele and one supports the deletion. Best, Zhuqing

fritzsedlazeck commented 5 years ago

Thanks. I did not see the reference allele read. Still this is not enough information. You see these methods are implement to work on a couple of reads including sequencing errors. We had a user before reporting in with simulated data were the problem is that there were no sequencing errors on the read.

So anyways. What I would suggest is to simulate at least 5 reads with sequencing error included. You can do that for example with SURVIVOR or a different method. Please let me know if this was the problem. I am happy to help. Thanks Fritz

biozzq commented 5 years ago

Dear @fritzsedlazeck Thank you. I want to detect the structure variations between different assemblies, so the assembly data will not involve many sequencing errors. Just out of interest, why do sniffles expect sequencing errors on the reads? As we know, the best sequencing data should not involve many sequencing errors, so it would be reasonable to take all the reads into consideration during genotyping. Sincerely, Zhuqing

fritzsedlazeck commented 5 years ago

It runs a calibration in the beginning and that can get confused. It assumes noisy data since it was designed for pacbio and ont reads. Here it tries to also interpret regions of reads that show an abnormal high error rate. These regions can indicate potential Svs that weren't mapped out nicely by the aligner used.

I will put it on my list to see if I can improve things on the assembly side as well. Thanks Fritz

wangjiawen2013 commented 5 years ago

Dear Fritx, what does "REF_strand" means in the above sniffles output ?

fritzsedlazeck commented 5 years ago

This is currently only a test for myself that I kept. I need to further define the code before I would recommend to use it. Thanks Fritz

Machadum commented 2 months ago

Dear Fritz,

I am working on a haploid cell. In my case is the SV genotyping relevant?

fritzsedlazeck commented 2 months ago

ignore the gentotype. Thats calibrated for diploid. otherwise it should be good. Best Fritz

Machadum commented 2 months ago

Sounds good thanks! Can I though try to interpret it as: only a portion of the reads contain the SV (when 0/1; or all reads when 1/1) or it just does not mean anything?