HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

Not detecting variants below 10% and incorrect allele frequencies #335

Closed daysmcgrath closed 1 month ago

daysmcgrath commented 3 months ago

Hello,

I have tested dozens of samples and keep running into the same issues. First, despite --snp_min_af=0.0, clair3 has yet to call any variant below ten percent. Secondly, for the variants it does call, the allele frequency does not equate to sample.dp/mapped reads. For instance, in my clair3 output, sample.dp will be 9825 and total mapped reads will be 9833, yet the allele frequency is 0.7318 instead of 9825/9833 = 0.999186. Additionally, when working with dorado base calling hac and fast models, it generates multiple variants that do not exist whatsoever (these is more common with indels than SNPs) when I check the bam files (this issue is resolved when changing to sup model in dorado base calling). I've also ran the same data on different variant callers that report other mutations that do exist in bams where allele frequency was as high as 25% and as low as 4%.

Here is the general command I have been running to get these issues: run_clair3.sh --bam_fn=$BAM --ref_fn=$REF --threads=4 --qual=0 --platform="ont" --model_path=/path_to_models/r941_prom_sup_g5014 --output=./clairout/ --bed_fn=$BED --snp_min_af=0.0 --haploid_sensitive --chunk_size=5000

I have tried guppy, dorado (fast, hac, and sup were all tested) for base calling to determine if there was issues with the data. From all of these base calling methods, only variants that are basically unanimously found across all reads are consistently called by clair3. I am running clair3 from a conda install.

Thanks, Daisy