fritzsedlazeck / Spectre

Copy number caller for long read data including SNV utilization
MIT License
44 stars 2 forks source link

About min_cnv_len #22

Closed EugeneKim76 closed 3 weeks ago

EugeneKim76 commented 1 month ago

Dear all Thanks for releasing this nice tool. Now, I'm going to adjust min_cnv_len.

Is there any problem using Spectre for detection of cnv < 100kb? Is there limitation of min_cnv_len?

Best

philippesanio commented 1 month ago

Hello @EugeneKim76

Thank you!

Short answer: Yes, you can adjust the minimum CNV length from 100kb as low as 10kb, with the drawbacks of introducing false positives (FPs).

Long answer: Spectre was designed to detect large CNV events, because many of them are often missed by well established variant callers. If you want to go below 100kb you are running into the risk of introducing more FPs the lower you set the threshold. The FPs are caused due to noise in the coverage signal. When you run Spectre just with default parameters, you can see in the chromosome plots in the img folder of the output folder, that the coverage signal varies widely even in good regions. To mitigate the effects of noise, Spectre requires, for an initial CNV, a minimum of 10 consecutive coverage values. Hence, effectively limiting Spectre to a CNV minimum of 10kb, assuming a 1kb Mosdepth bin size. In theory, if you had an almost perfect coverage signal, you could go as low as of a min_cnv_len of 10kb.

Usability strategies: Until version 0.2.1 the scope of testing Spectre mostly focused on testing with datasets with CNVs of >=100kb. However, in the future, we want to try to low the default min_cnv_len threshold even lower.

If your main goal is to detect most of the CNVs around 100kb and above, you could try to run Spectre with a min_cnv_len of 90 or 80kb. This would enable you to detect CNVs (e.g., 95kb) which are just below the default threshold. Depending on the quality of your sample, this could already introduce FPs. Going even lower will definitely introduce FPs.

If you want to only detect variants (e.g. SVs) up to a couple of kbs, I would suggest running Sniffles instead.

However, if you want to detect the full spectrum of variants, you can try to run Sniffles and Spectre in tandem. Spectre can use the calls from the Sniffles file (SNF). For this, you just need to convert the SNF file to a SNFJ file using the tool SNF2JSON and provide Spectre the path of the SNFJ file with the flag --snfj. Spectre would then search in the Sniffles output for SVs which support the position of the Spectre CNVs. If a support was found, the SVSUPPORT flag is set to TRUE (default=FALSE) in the INFO field of the Spectre output VCF. This flag could then be used as an additional quality filter for smaller CNVs in the evaluation downstream. Please note, this is an experimental feature.

I hope this helped. If you have any questions, encounter issues or want to see a feature in future releases, feel free to ping me any time.

Cheers, Philippe