HKU-BAL / ClairS-TO

ClairS-TO - a deep-learning method for tumor-only somatic variant calling
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

F1 scores and high coverage datasets #9

Closed sq101 closed 1 month ago

sq101 commented 1 month ago

Hi there!! We're currently testing ClairS-To using ONT DNA reads and a variant truth set to call SNPs. We have noticed; however, that the F1 score decreases relevently whenever the coverage is above 1000x (e.g. F1 of 0.90 if cov. 1000x, F1 of 0.5 when using higher coverage).

Therefore, I wanted to kindly ask: could there be a reason for this behaviour?

Thank you very much for any reply in advance and thank you very much for ClairS-TO Cheers!

aquaskyline commented 1 month ago

The model wasn't trained with that much excessive coverage. We suggest downsampling to below 1000x.

sq101 commented 1 month ago

Hi @aquaskyline ! Thank you very much for your speedy reply and support! We'll have that into account. Tha k you very much once again :)