HKU-BAL / ClairS-TO

ClairS-TO - a deep-learning method for tumor-only somatic variant calling
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

RefCall variants #11

Closed sq101 closed 1 month ago

sq101 commented 1 month ago

Hi there! We are testing clairS-TO with the --print_ref_calls option. My question is: what does the RefCall flag mean? Could you further elaborate on it?

I am aware that in other software it can mean that the variant was proposed as mutation but later is rejected by the variant caller; however, I would appreciate if you can provide further information on what it means in ClairS-TO.

Thank you very much and thank you as well for your tool! Regards!

aquaskyline commented 1 month ago

default+print_ref_calls: the variant was proposed as non-reference mutation but later is rejected by sto is printed

genotyping_mode_vcf_fn+print_ref_calls: the variants listed in the input VCF that are decided by sto as reference calls

hybrid_mode_vcf_fn+print_ref_calls: union of above two

sq101 commented 1 month ago

@aquaskyline Thank you very much for the fast reply! I would have the next follow-up questions under the context of using default+print_ref_calls:

a) Under which conditions are the variants with the RefCall flag rejected?

b) Are RefCall variants also LowQual variants? Why (or why not)? We normally observe only the RefCall flag by itself, unlike LowQual which is usually accompanied by other flags (";" separated).

c) How can we make the best use of the RefCall flags? For example, are variants with RefCall more likely to be true/false positives, are they an indication of sequencing artifacts, etc.

Thank you very much for your patience and support once again Regards!

JasonCLEI commented 1 month ago

Hi @sq101 as for your questions:

a) ClairS-TO first extracts variant candidates through a simple heuristic approach, and then our neural networks classify whether these variant candidates are real variants or not. Those candidates rejected by our neural networks as not real variants are flagged as RefCall.

b) RefCalls are not LowQual variants. LowQual variants are those variants that have been classified as real variants by our neural networks but have not passed subsequent quality control filters. But RefCalls are those rejected by our neural networks and classified as reference calls.

c) RefCall flags allow us to obtain a full record of every variant candidate that ClairS-TO was asked to classify. RefCalls are more likely to be true negatives (true reference calls) and an indication of sequencing artifacts as you said. In addition, it allows users to design different filtering strategies in the event they want to increase recall but inevitably increasing a large number of false positives.

I hope these explanations can give you some insights. Thanks a lot for your interest on ClairS-TO.

Lei

sq101 commented 1 month ago

@JasonCLEI Thank you very much for your time and all the answers! That was very helpful indeed!! 👍 All is clear from my side now, so I'll close the ticket. Thank you once again!