Closed keremozdel closed 11 months ago
The AF in snp_min_af
and indel_min_af
means the sum of AF of all alternative alleles at a site. The AF of each VCF record is just an alternative allele at a site.
A post-processing step to remove variants with unexpected AF is suggested.
Thank you for your response.
If I set the snp_min_af
filter to 0.3 and observe 100 bases at a specific position, with the reference allele being G, and I find 20 T's and 5 A at that position, Clair3 would not call any variant because the alternative allele frequency is 0.25, which is below the specified filter threshold of 0.3. Is my understanding correct?
Yes.
Hello,
I'm facing an issue while trying to filter indels based on minimum allele frequency. After I set the
--indel_min_af
parameter to 0.3, I can still observe indels with lower AF. I'm working with amplicon data and here is my code:run_clair3.sh --gvcf --platform="ont" --indel_min_af=0.3 --snp_min_af=0.08 --model_path=${PATH}/r941_prom_sup_g5014 --bam_fn=${BAM} --bed_fn=${BED} --ref_fn=${REF} --var_pct_full=1 --ref_pct_full=1 --var_pct_phasing=1 --output=${OUTPUT}
The problem is that there are some questionable indels with lower af than the specified threshold in my result. Here are some examples:
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | SAMPLE
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- chr17 | 43123064 | . | G | GA | 2.44 | PASS | AD=226,61;DP=719 | GT:GQ:DP:AD:AF:PL | 1/1:2:719:226,61:0.0848:17,5,0 chr17 | 43095603 | . | C | CA | 4.44 | PASS | AD=148,58;DP=408 | GT:GQ:DP:AD:AF:PL | 0/1:4:408:148,58:0.1422:16,0,9 chr17 | 43071847 | . | C | CA | 9.09 | PASS | AD=164,95;DP=654 | GT:GQ:DP:AD:AF:PL | 0/1:9:654:164,95:0.1453:13,0,36
I suspect most of the reads are filtered due to low MQ and BQ, thus I'm getting these problematic results. And that's why I want to filter them. I was wondering why Clair3 doesn't filter them as expected.
Thanks in advance!