dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
442 stars 137 forks source link

Interpreting confidence intervals #383

Open AndreasSandJespersen opened 5 months ago

AndreasSandJespersen commented 5 months ago

I'm looking at delly vcf outputs and trying to compare them to other callers. What distributions, % confidence is used for CIPOS and CIEND? Is this user defined?

tobiasrausch commented 5 months ago

This is based on the paired-end distributions and how many pairs support an SV and whether the SV is supported by split-reads.

AndreasSandJespersen commented 5 months ago

Firstly, thanks for getting back so quickly :)

This helps in understanding what supports CIPOS and CIEND, but it doesn't quite tell me what they mean.

The use case is understanding overlaps between variants. If two variants don't overlap when not considering CIPOS and CIEND, but overlap when considering CIPOS and CIEND for each - how should this be interpreted?

My impression was that regardless of how CIPOS and CIEND are calculated, they would give information about the % significance for POS and END locations for some probability distribution.

I'm not really sure where to start if CIPOS and CIEND don't give some % significance for POS and END locations.

zhangshouwei309194 commented 4 months ago

This is based on the paired-end distributions and how many pairs support an SV and whether the SV is supported by split-reads.

Dear author: When I used delly for SV detecting,i found a questions that maked me confused. For example: SR indicates the number of split reads is 39, but RV (the number of junction reads) is 0. For junction reads and split reads, it should be the same notation. Is there any reasons in genotyping? 8 121299546 BND00008260 A [10:123241058[A 2340 PASS PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.1.6;END=121299547;CHR2=10;POS2=123241058;PE=0;MAPQ=0;CT=5to5;CIPOS=-5,5;CIEND=-5,5;SRMAPQ=60;INSLEN=0;HOMLEN=5;SR=39;SRQ=1;CONSENSUS=CTCTCCATAACCAAGAAAATAAACATGCCAAGAGGAATTTGGTGAGTAAACAATGTTAAGTCCTAAGAGCTGCTAATGGGACCACTTTGAGCCATGAACTAATAAATCTCCACCACATCAAAAGAGAACTTTTTGCTTACAATGATAAAAACGAAATTTTGTCCTAAATGGAACCGTTTTTCTTGAGCATATGGTAATGATTTTCAGAAGGAAAGAAACTTCGATTTTTATATCCACCAGAC;CE=1.92421 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-75.1615,-768.704:10000:PASS:338:4712:4374:2:0:0:250:0 Look forward to your reply! Thank you. Thank you . Yours sincerely Phillip!

tobiasrausch commented 3 months ago

I think that's a duplicate question with issue https://github.com/dellytools/delly/issues/385 so I am closing this one.

AndreasSandJespersen commented 3 months ago

Not to beat a dead horse, but I don't think this was really answered :3

Firstly, thanks for getting back so quickly :)

This helps in understanding what supports CIPOS and CIEND, but it doesn't quite tell me what they mean.

The use case is understanding overlaps between variants. If two variants don't overlap when not considering CIPOS and CIEND, but overlap when considering CIPOS and CIEND for each - how should this be interpreted?

My impression was that regardless of how CIPOS and CIEND are calculated, they would give information about the % significance for POS and END locations for some probability distribution.

I'm not really sure where to start if CIPOS and CIEND don't give some % significance for POS and END locations.

tobiasrausch commented 3 months ago

Ah sorry, so CIPOS and CIEND are completely derived from the mapping locations of reads and given that germline SVs are often homology-mediated (repeat-mediated) these can be quite misleading. That's why SV comparison tools gradually move towards comparing SV alleles instead of solely relying on reciprocal overlap. That's also what I implemented in sansa

AndreasSandJespersen commented 3 months ago

Thanks! Ok, off topic from confidence intervals now, but by comparing SV alleles do you mean comparing Alt Sequence information? I read a bit into sansa, but couldn't quite see where the SV allele divergence value comes from.

tobiasrausch commented 3 months ago

Sansa uses delly's INFO/CONSENSUS sequence which is the ALT sequence + surrounding sequence (i.e., a local assembly of SV supporting reads).