fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
561 stars 95 forks source link

Inquiry about Coverage in Multi-Sample SV Calling #514

Closed vivi-1406 closed 3 weeks ago

vivi-1406 commented 4 weeks ago

Hello, I am currently using Sniffles for Multi-Sample SV Calling and have combined 18 samples. In one of the variant outputs, I encountered the following entry:

1dd88e81ce3acdfd2d3bbcc9e41ee5d

I would like to inquire about the meaning of the five numbers after "COVERAGE." Are they averages of the effective read counts (DR + DV) across the 18 samples? Or how they calculated it?

hermannromanek commented 3 weeks ago

Hi Vivi,

The coverage values are the same as in single sample calling mode, i.e. upstream, start, center, end, downstream coverage of the variant. In multisample calling, the mean of all constituent single sample variants is used.

Those values should be the number of reads at those locations (meaning for insertions values 2, 3, and 4 should be roughly equal to DV, while the others should be upstream and downstream read count).

I think for the next release there will be a slight change where start and end coverage for insertions will be moved just outside the variant, meaning those should be DR at beginning and end of the variant, while the center coverage should be DV.

Hermann