Different results with same truth set and settings

Illumina / witty.er

What is true, thank you, ernestly. A large variant benchmarking tool analogous to hap.py for small variants.

Other

28 stars 1 forks source link

Different results with same truth set and settings #22

Closed NikkiJegen closed 1 year ago

NikkiJegen commented 1 year ago

Hello,

I am currently working with the truth set of HG002 (HG002_SVs_Tier1_v0.6.vcf.gz, HG002_SVs_Tier1_v0.6.bed) to evaluate our internal DRAGEN CNV calling. To evaluate, I am using the same parameters and config file as used in your examples. The sample I used in the evaluation was run with DRAGEN, version 3.8.4. When I compared both results, I noted that my results are noticeably worse, especially with the Recall.

The results from the example are:

EventPrecision: 63.666%,
EventRecall: 4.687%,
EventFscore: 8.731%

The overall stats in my results are:

EventPrecision: 53.226%
EventRecall: 0.761%
EventFscore: 1.501%

I am wondering what causes this big of a difference. Is it the DRAGEN version, or perhaps how the sample was processed in the lab? Maybe filtering minimum CNV size?

I would love to know your input.

With kind regards, Nikki

Kentalot commented 1 year ago

Let me look at this and get back to you

Kentalot commented 1 year ago

Is there a way you can share with me the output folder? the annotated vcf file and the stats and config file used etc.

Kentalot commented 1 year ago

Try updating your config to have "!1,1000,5000,10000,20000,50000" for bins, that might help (we have an update soon that will make the example use this value.

NikkiJegen commented 1 year ago

@Kentalot I will discuss with my manager tomorrow and update the config. I will let you know if it works as soon as possible!

NikkiJegen commented 1 year ago

Hello,

After running with the updated config, I have the following results:

EventPrecision: 53.226%
EventRecall: 6.004%
EventFscore: 10.790%

The Recall and Fscore went up by quite a lot, but the Precision stays the same percentage. Do you still want to look at the output? If so, how would you like to receive it?

Kentalot commented 1 year ago

yeah, could you just host it on the cloud and send me a link?

NikkiJegen commented 1 year ago

We would prefer if the link was sent privately, do you have an e-mail that I can send the link to? @Kentalot

Kentalot commented 1 year ago

please check the readme. It has my email

NikkiJegen commented 1 year ago

The issue has been resolved.

The problem was that the cnvLength is set to different values. The example CNV length threshold is set to 1k while my DRAGEN output is set to 10k. However, the caller still outputs anything <10k but just filters out those with a cnvLength filter. The config file is updated to have includeFilters to include both PASS and cnvLength and it recovered those entries.

Thank you for the solution!