fritzsedlazeck / Spectre

Copy number caller for long read data including SNV utilization
MIT License
54 stars 3 forks source link

Adaptive sequencing CNV calling #23

Open adbeggs opened 6 months ago

adbeggs commented 6 months ago

Hi team

I realise that Spectre may struggle on CN calling in adaptive sampling due to stochastic read depth variation - have a sample that has undergone adaptive LR of BRCA. There is a clear DUP on IGV but both sniffles and spectre fail to pick it up:

We're up to date with latest versions (as far as I am aware).

image

philippesanio commented 5 months ago

Hi @adbeggs

The duplication looks fairly obvious to me. However, it is just 4kb in size, which is outside the scope of Spectre.

Spectre was designed to detect large CNVs roughly 100kb and upwards. When we tested numerous data sets (including benchmark datasets) we observed that many of them showed high noise levels, which would result in many FPs. To counter the effects of noise, Spectre requires at least a sequence of 10 coverage data points to form an initial CNV candidate. Hence, we locked Spectre to a minimum CNV length of 10kb. Without that filter, Spectre would require an incredibly high signal-to-noise data set with almost no noise, which is more on the unrealistic side.

Some internal tests have shown that it is possible to lower the min-cnv-len to 90kb or 80kb. Going lower, starts to introduce more FPs the lower you set the threshold. However, we did not test that extensively, so take this with a grain of salt.

In the future, we definitely want to revisit the topic of lowering the recommended minimum CNV length of 100kb.

On the Sniffles side, I would have to refer you to @lfpaulin or Fritz since they are the expert on that matter.

I hope that explanation helped. Cheers, Philippe