epi2me-labs / wf-human-variation

Other
86 stars 41 forks source link

Big difference between results of qdnaseq and spectre #189

Open EugeneKim76 opened 3 weeks ago

EugeneKim76 commented 3 weeks ago

Ask away!

Dear all I obtained CNV results using nanopore reads (cov : ~50x, N50 : ~40kb). However, there was big difference between the results of qdnaseq and spectre as shown below. qdnaseq : 4 CNVs spectre : ~300 CNVs

In order to find the reasons for the difference, I investigated the scripts of wf-human-variation and found that spectre is not used as recommended as shown below. Why options of wf-human-variation are different from recommended? Do we have to optimize the options for our sample to reduced the difference?

1. Mapping quality The author of spectre pointed out that it is recommend to run Mosdepth with a bin size of 1kb and a mapping quality of at least 20 (-Q 20), as shown below (https://github.com/fritzsedlazeck/Spectre)

mosdepth -t 8 -x -b 1000 -Q 20 -c X "${out_path}/${sample_id}" "${bam_path}"

However, I could not find -Q 20 option in the mosdepth script of wf-human-variation.

2. options of spectre.py CNVCaller The author of spectre pointed out that we can adjust the minimum CNV length from 100kb as low as 10kb, with the drawbacks of introducing false positives (FPs). (https://github.com/fritzsedlazeck/Spectre/issues/22) However, options of spectre in wf-human-variation is different from author's as shown below.

1) spectre author's --threshhold-quantile 5 --dist-proportion 0.25 --min-cnv-len 100000

2) wf-human-variation --threshhold-quantile 10 --dist-proportion 0.3 --min-cnv-len 80000

vlshesketh commented 2 weeks ago

Hi @EugeneKim76, thank you for your interest and report.

When adding qdnaseq our intention was for it to detect very large (megabase pairs) events. Thus, the default bin size of 500kb meaning events <1 mb will most likely not be reported. With Spectre integration one of the objectives was to reduce the minimum detectable event size (to around 100kb). This might partly explain the differences you see in your data when comparing the two. Basically, most of the difference will come from differences between QDNAseq and Spectre, rather than custom Spectre parameters used in wf-human-variation (please also see this issue)

The default options currently used in wf-human-variation are based on the results of our internal benchmarks. Importantly, they have been optimized more to the 15-20kb N50 range and 30x coverage. So, there might be room to improve parameters to match your use case better. In general, the amount of false positive events detected by Spectre increases with increase in read length, which we are looking into resolving by e.g. the integration of the breakpoint analysis.

In the meantime, for detection of events shorter than 100Kb, we agree with Philippe (here) that a better option is to rely on Sniffles results. Moreover, if the application allows you to focus on larger events first, increase of --min-cnv-len would be our recommendation toward reduction of the number of calls. Our experience suggests that pushing --min-cnv-len to 150-200kb range should considerably reduce the number of calls in your case. Regarding mosdepth parameters – our internal benchmarking has used default parameters. We don't expect the MAPQ filter value to considerably help with the number of reported events but will do another round of internal testing.

In regard to your question about differing default parameters -- the version of Spectre in wf-human-variation differs from the latest Spectre release (we are planning to coalesce with the latest Spectre release soon). In particular, it uses a slightly different strategy for threshold estimation, which is controlled by --threshhold-quantile parameter you are mentioning later (sorry for a misprint in the name). The latest Spectre release does not expose the corresponding parameter as a command line option, so we are not sure what you mean, when you refer to --threshhold-quantile 5 as the author's recommended value. Out of the three parameters you mention towards the end, this one will have most impact, but the default Spectre behavior should be much closer to --threshhold-quantile 10 than to --threshhold-quantile 5.