Closed SebastianMeyer1989 closed 2 years ago
Hi,
When scanning the BAM alignment, Clair3 discarded secondary alignments, supplementary alignments and the reads with mapping quality < 5. And for high-coverage(>144 for pileup and >89 for full-alignment) candidates, we apply samtools subsampling on them, the output DP and AF tags were also calculated after filtering. Might you try to mpileup the position with following command to check whether any read is filtered:
samtools mpileup --max-depth 144 --excl-flags 2316 --min-MQ 5 --min-BQ 0 -r EFAU004_01163:102-103 ${BAM}
Thank you for your fast reply. I definitely have a lot of secondary/supplementary alignments in my setup, so this could really be it. I will check with your provided command. Is it possible, to tell Clair3 to not discard those secondary/supplementary alignments?
Edit: For the mentioned reference I have 11 "normal" alignments and 278 supplementary alignments. Since I am mapping large Nanopore reads against single genes (instead of a whole genome) and a lot of those genes are located on the same reads, there are many supplementary alignments. Which I want to keep and use for the variant calling.
We excluded secondary/supplementary alignments using samtools flag. We have not exposed this option for users now, you might need to modify SAMTOOLS_VIEW_FILTER_FLAG
from 2316 to 12 in two lines (https://github.com/HKU-BAL/Clair3/blob/main/shared/param_p.py#L39 and https://github.com/HKU-BAL/Clair3/blob/main/shared/param_f.py#L33) to include secondary/supplementary alignments.
Pls noted that the changes might affect the performance significantly so please check the outputs carefully.
Cool. This could be a good solution for my issue. And of course I will carefully check the output for strange behaviours.
I have a question to this changing of FLAGs: 2316 means unmapped reads, unmapped mates, supplementary alignments and secondary alignments are filtered out. 12 means only unmapped reads and unmapped mates are filtered out. If I wanted to keep the supplementary alignments, but not the secondary alignments, I should probably change it into 268 (instead of 12), if I calculated this correctly. Or would that interfere with anything else, regarding the programming of Clair3 (in comparison to changing the FLAG to 12)?
Thanks again for your fast and helpfull reply. :)
Changing the FLAG won't mess up with Clair3, but it might slow down Clair3 a bit if there are plenty of supplementary alignments.
Hello,
I am using Clair3 to call variants in microbial data. I have a position in my reference (the base T), where the bam-file (generated with minimap2) tells me, that I have a total read count of 268 reads. In this position, 6 of the reads have the alternative base C. In the VCE file, generated with Clair3, this position is now called as heterozygous with an allele frequency of 0.2222.
EFAU004_01163 102 . T C 6.77 PASS F GT:GQ:DP:AF 0/1:6:9:0.2222
Can you tell me, what the reason could be for this high allele frequency? By my calculation 6 of 268 reads should be around 0.02 (and with that below the 8% threshold for calling a variant). Maybe I missed some parameter, which I need to modify.