HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
246 stars 27 forks source link

Problem with minimum indel AF filter #253

Closed keremozdel closed 11 months ago

keremozdel commented 11 months ago

Hello,

I'm facing an issue while trying to filter indels based on minimum allele frequency. After I set the --indel_min_af parameter to 0.3, I can still observe indels with lower AF. I'm working with amplicon data and here is my code: run_clair3.sh --gvcf --platform="ont" --indel_min_af=0.3 --snp_min_af=0.08 --model_path=${PATH}/r941_prom_sup_g5014 --bam_fn=${BAM} --bed_fn=${BED} --ref_fn=${REF} --var_pct_full=1 --ref_pct_full=1 --var_pct_phasing=1 --output=${OUTPUT}

The problem is that there are some questionable indels with lower af than the specified threshold in my result. Here are some examples:

CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | SAMPLE

-- | -- | -- | -- | -- | -- | -- | -- | -- | -- chr17 | 43123064 | . | G | GA | 2.44 | PASS | AD=226,61;DP=719 | GT:GQ:DP:AD:AF:PL | 1/1:2:719:226,61:0.0848:17,5,0 chr17 | 43095603 | . | C | CA | 4.44 | PASS | AD=148,58;DP=408 | GT:GQ:DP:AD:AF:PL | 0/1:4:408:148,58:0.1422:16,0,9 chr17 | 43071847 | . | C | CA | 9.09 | PASS | AD=164,95;DP=654 | GT:GQ:DP:AD:AF:PL | 0/1:9:654:164,95:0.1453:13,0,36

I suspect most of the reads are filtered due to low MQ and BQ, thus I'm getting these problematic results. And that's why I want to filter them. I was wondering why Clair3 doesn't filter them as expected.

Thanks in advance!

aquaskyline commented 11 months ago

The AF in snp_min_af and indel_min_af means the sum of AF of all alternative alleles at a site. The AF of each VCF record is just an alternative allele at a site.

A post-processing step to remove variants with unexpected AF is suggested.

keremozdel commented 11 months ago

Thank you for your response.

If I set the snp_min_af filter to 0.3 and observe 100 bases at a specific position, with the reference allele being G, and I find 20 T's and 5 A at that position, Clair3 would not call any variant because the alternative allele frequency is 0.25, which is below the specified filter threshold of 0.3. Is my understanding correct?

aquaskyline commented 11 months ago

Yes.