HKU-BAL / ClairS

ClairS - a deep-learning method for long-read somatic small variant calling
BSD 3-Clause "New" or "Revised" License
74 stars 7 forks source link

Very few somatic variants in output VCF #36

Closed mdehankar3 closed 1 month ago

mdehankar3 commented 1 month ago

Hello, The outputs from running ClairS on PacBio (hifi_revio platform) are providing less than 5 somatic SNVs. We identified multiple high-confidence somatic variants from short-read sequencing on same set of samples, so we are expecting more than the observed somatic hits from corresponding PacBio sequencing.

On troubleshooting, log file 3-1_CPT.log for creating pair tensor show zero tensors generated for all chunks across all chromosomes, and 3-2_PREDICT.log similarly show zero processed positions across all chunks. Logs 1_EC.log, 2-1_CPT.log and 2-2_PREDICT.log show there are positions being processed until step 2 but being filtered in step 3. Log file 5_MV.log shows more than 100k full-alignment variants filtered by pileup .

I'm wondering what might be causing most variants to be filtered, and if there were specific parameter values that need tweaking for this set of samples. Any hints/resources will be helpful, thank you.

mdehankar3 commented 1 month ago

Closing this issue: found repeated read names in input BAMs causing downstream issues, unrelated to ClairS.