Open haraldgrove opened 10 months ago
Clair3 concludes genotype based on AF and many other factors. Reads supporting the reference allele could be mistakes due to sequencing or alignment errors. If Clair3's model thinks many of the reads supporting the reference allele are more likely to be mistaken, it might conclude the genotype to be 1/1 instead of 0/1. Clair3 might or might not be correct about your examples, but please pay more attention to them because there exists evidence that discredits the reads supporting the reference allele.
Thank you. I discovered that there was an uncalled deletion covering the same position. So the reference allele was not present at all. It's still a confusing variant, but maybe not much to do about it.
A follow up to the issue I had with understanding the AF output from Clair3.
I found another location where Clair3 is calling a homozygous insertion, but where an inspection in IGV seems to strongly suggest it to be a heterozygous insertion instead. It also seems as if Clair3 is not discarding any reads as unreliable since the total number of reads is the same in both instances.
The variant:
Ssc13 147177599 . T TTAA 21.09 PASS P GT:GQ:DP:AD:AF:PL 1/1:21:43:2,40:0.9302:63,40,0
I'm wondering if there are any way for me to know what Clair3 is basing its decision on?
Hi, @haraldgrove,
Could you please provide us with the pileup result of the flanking 10bp windows? This will help us pinpoint the genotype issue more accurately. To obtain the mpileup result using samtools, you can use the following command:
samtools mpileup --min-MQ 5 --min-BQ 0 --ff 2316 -r Ssc13:147177594-147177604 ${BAM}
Thanks @haraldgrove, your case shows an insertion following a deletion immediately, and it looks like a bug in Clair3 to me. Besides the mpileup results, could you please also send us a minibam covering the case. Thanks!
Thank you @aquaskyline and @zhengzhenxian for following up on this.
The mpileup output looks like this:
Ssc13 147177594 G 44 .,.,..,.,.,,....,,,,,,..,...,,,,.,,.,..,..., C&:6:9<7A5;:<4?<263;A!?;%A15=6<8@!<!/;;?38<=
Ssc13 147177595 T 44 .,.,..,.,.,,....,,,,,,..c...,,,,.,,.,..,..., @&9+<)=7A5=??.>=)5/:F!A>%A14?8>9A!<!0>:A4<=?
Ssc13 147177596 T 44 .,.,..,.,.,,....,,,,,,..,...,,,,.,,.,..,..., A):(>&A7A2@>?)?=&6.9{!B?'@14@9>;@#<!1A:A;<=>
Ssc13 147177597 A 44 .,.,..,.,.+2GT,,....,,,,,,..,..-2AT.,,,,.,,.,..,..., A*8)A%B8@+A@@(=>%6.;E!@@(@,5@6><A#=!1@;B5<6?
Ssc13 147177598 A 44 .-1T,-1t.,..,.-1T,.,-1t,-1t.-1T.-1T.-1T.,,-1t,,-1t,-1t,.-1T.,.-1T*.,,,,.-1T,,-1t.+2AT,-1t.-1TG,-1t..-1T.,-1t @*8(B#A:<!==B);>#5)9D!A>(@*6@5<?@!8!0A2I3=+<
Ssc13 147177599 T 44 **.,.+3TAA.+3TAA,+3taa*,+3taa.*****.+3TAA,+3taa*,**,+3taa*.,**.,+3taa,+3taa,+3taa,+3taa*,*.+2AA**.+1G*.+3TAA*.+3TAA* !!7'@"?!?!!!!!!?"!(!!!!='!*6@4=?!!!!!!2!3!+!
Ssc13 147177600 T 44 .,.,..,.,.,,....,,,,,,..,...,,,,.,+2aa,.,.G,..., !!8&$!!!$!!!!!!$!!*!!!!?+!*7$$$$!!!!!!!!$!$!
Ssc13 147177601 T 44 .,.,..,.,.,,....,,,+1c,,,..,...,,,,.,,.,..,..., !$6(C!5$;!$!$$$?!$)$!!$>)$*6=064$!!!$$!!4$+$
Ssc13 147177602 T 44 .,C,..+1C,.,.-1C,,....,+1c,,,,,.C,...,,,,.,,.,.A,..., !'7)@!;';!'''''B!')''!'>+'+18114'!'!''#'5'('
Ssc13 147177603 C 44 .+2AG,.g..,.,*,,....,,g,,,..,...,,,,.,+1t,.,..,..., !,**B+:@>)<8{=D@-5)9@!{).C(47219>!4!1D%>3@+<
Ssc13 147177604 A 44 .,.t..,.,.,,....,,c,,,..,...,,,,.t,.,..,..., /5)+6(;4>)?8C4<?.9)9?!:(.?&/73:;*!8!5,&?-6,<
The BAM file for the region is attached. ssc13_region.bam.gz
Hi, @haraldgrove,
Thanks for providing the BAM, it seems the Clair3 pileup model made an incorrect zygosity prediction due to the insertion variant was followed by a deletion immediately.
Could you try to add --var_pct_full=1.0
option to feed all pileup results to a more reliable full-alignment model? We consider the full-alignment network would perform better in such cases.
@haraldgrove please kindly let us know if --var_pct_full=1.0
gives the correct answer of your variants. If yes, we will force these type of variants to go into the full-alignment model to solve the problem in a long-run.
Hi.
Thank you for the suggestion. The full-alignment mode fixed the problem.
I've been running Clair3 (v1.01) on some pig samples with ONT reads. I noticed that Clair3 has been assigning the "1/1" genotype to SNVs that have an allele frequency where I would expect an "0/1" genotype.
Example:
Do you have any idea why this might be happening?
-Best regards Harald