HKU-BAL / ClairS

ClairS - a deep-learning method for long-read somatic small variant calling
BSD 3-Clause "New" or "Revised" License
75 stars 7 forks source link

Modify run_clairs to take normal vcf as input and start there #7

Closed xingyaoc closed 1 year ago

xingyaoc commented 1 year ago

Hi, thanks for making this tool, it is really great!

Summary of change: Modify run_clairs to take normal vcf as input to skip germline calling of normal sample. Why I want this feature: In my research, we often do not have a matched normal pair so we default to using a technical control as "normal" to identify and flag error-prone regions. Therefore, when we call somatic variants on multiple tumor samples, we would like to use the same germline vcf and avoid recalling it for every tumor sample in order to cut down runtime. I've modified the code entry point to add an option to pass in an optional normal vcf: --normal_vcf_fn where, when not null, skips normal germline calling.

Testing done: I have attached outputs of ont_quick_demo.sh from both branches for comparison. master.log input_normal_vcf.log

Note: I know this feature might not promote the best use of ClairS. If my use case is too niche, please feel free to ignore this pull request -- I can build my own docker image.

zhengzhenxian commented 1 year ago

@xingyaoc,

I think the --normal_vcf_fn option a great solution if no control normal sample provided. We will mark the option as "EXPERIMENTAL" (means an option for advanced users) before extensive testing, thanks!

Zhenxian