HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

Variant is not detected in amplicon sequencing data #274

Closed keremozdel closed 8 months ago

keremozdel commented 9 months ago

Hello,

I am using Clair3 for variant calling on amplicon sequencing data. These reads were basecalled using Dorado (dna_r10.4.1_e8.2_400bps_sup@v4.3.0 model). For Clair3, I am using the r1041_e82_400bps_hac_v420 model (from Rerio). I have set the parameters to --var_pct_full=1 --ref_pct_full=1 --var_pct_phasing=1. However, Clair3 does not detect a variant at a specific position, which is visible in IGV. I also tried running Clair3 with r1041_e82_400bps_hac_v430 and r1041_e82_400bps_sup_v430 models, but the result was the same. Furthermore, I observed a decrease in recall when I used v430 models. I attached the IGV image of the variant in question: igv_cpt1

Do you have any idea what might be the reason?

Thank you for your assistance.

aquaskyline commented 9 months ago

Positive strand reads are two times more than those of negative strands. Is it expected?

keremozdel commented 9 months ago

It is not expected. I suspect the primer efficiency varies between the primers, and I am wondering whether this affects the accuracy of variant calling, even though there are sufficient reads for both strands?

aquaskyline commented 9 months ago

In all available Clair3 models to date, a balance between two strands is required. So my guess is that Clair3 rejected your case primarily because of strand bias.

keremozdel commented 9 months ago

This makes sense. I will investigate the reasons behind that strand bias. Thank you very much for your help.