Open peterk87 opened 10 months ago
Hello,
I figured I might as well reply to this issue, since it is semi-related.
I have been reading through the Clair3
documentation, and noticed a discussion regarding amplicon sequencing and the modification of a few parameters including changing --var_pct_full
, -ref_pct_full
and --var_pct_phasing
, to 1
as well as enabling the --no_phasing_for_fa
option. I was wondering if we should consider implementing as default for the pipeline.
I've also encountered a situation where a SNV, position 2169, is classified as RefCall
in full alignment mode
and PASS
in pileup mode
, but omitted from the merge_output.vcf
.
Full Alignment Mode:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
OQ993067.1 1470 . T C 43.68 PASS F GT:GQ:DP:AD:AF 1/1:43:1243:25,1177:0.9469
OQ993067.1 2169 . G . 3.31 RefCall F GT:GQ:DP:AD:AF 0/0:3:1380:75:0.0543
Pileup Mode:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
OQ993067.1 18 . T C 16.85 PASS P GT:GQ:DP:AD:AF 1/1:16:1182:28,1110:0.9391
OQ993067.1 344 . G . 24.42 RefCall P GT:GQ:DP:AD:AF 0/0:24:1167:989:0.8475
OQ993067.1 345 . G A 25.54 PASS P GT:GQ:DP:AD:AF 1/1:25:1167:38,1107:0.9486
OQ993067.1 407 . C . 17.09 RefCall P GT:GQ:DP:AD:AF 0/0:17:1168:886:0.7586
OQ993067.1 880 . G . 19.16 RefCall P GT:GQ:DP:AD:AF 0/0:19:1187:1027:0.8652
OQ993067.1 986 . G . 26.00 RefCall P GT:GQ:DP:AD:AF 0/0:26:1189:1049:0.8823
OQ993067.1 1149 . G A 16.00 PASS P GT:GQ:DP:AD:AF 1/1:16:1214:19,1159:0.9547
OQ993067.1 1273 . C . 19.26 RefCall P GT:GQ:DP:AD:AF 0/0:19:1231:913:0.7417
OQ993067.1 1461 . C . 18.22 RefCall P GT:GQ:DP:AD:AF 0/0:18:1243:1085:0.8729
OQ993067.1 1470 . T C 14.68 PASS P GT:GQ:DP:AD:AF 1/1:14:1243:25,1177:0.9469
OQ993067.1 2169 . G A 11.04 PASS P GT:GQ:DP:AD:AF 1/1:11:1380:75,1269:0.9196
merge_output.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
OQ993067.1 18 . T C 16.85 PASS P GT:GQ:DP:AD:AF 1:16:1182:28,1110:0.9391
OQ993067.1 345 . G A 25.54 PASS P GT:GQ:DP:AD:AF 1:25:1167:38,1107:0.9486
OQ993067.1 1149 . G A 16.00 PASS P GT:GQ:DP:AD:AF 1:16:1214:19,1159:0.9547
OQ993067.1 1470 . T C 43.68 PASS F GT:GQ:DP:AD:AF 1:43:1243:25,1177:0.9469
I take this to mean that the Clair3
model has decided, by some metric, that there is insufficient evidence to suggest it is a variant. However, based on manual interrogation of the BAM
file, it looks pretty convincing.
As you might recall, amongst the polymerase genes, you often see super high coverage of the 5' and 3' ends of the segment relative the middle. I believe this is linked the presence of defective viral genomes, which have been known to affect the sequencing of these segments. This particular SNV is right on the interface where we move from "normal" coverage to "super high" coverage, so this might be why Clair3 is tossing it.
I tested the --pileup_only
flag, and it did recover the SNV and incorporate it into the consensus sequence. However, I prefer your idea of keeping the more stringent Clair3 settings (i.e., using both full alignment
and pileup
modes) and supplementing Clair3
with BCFTools mpileup
to perhaps catch situations like this, as well as those SNVs located within the first and and last 16bp of the reference.
Thanks for your time, Cody
Hi @Codes1985 that's really interesting the discrepancy between full alignment and pileup results from Clair3. Honestly, I'm not sure what's going on and maybe the Clair3 devs would be able to enlighten us both. I don't see how the numbers in the AD field make any sense. It looks like a number is missing... off by one error? Supplementing Clair3 variant calling results with Bcftools and maybe even Medaka might be necessary just to ensure that all possible variants are captured without certain regions being ignored or obvious variants being dropped.
@Codes1985 maybe updating the versions of Clair3 and other tools in the pipeline might help (#69)
@peterk87 I was thinking it might be worthwhile to bump up to the latest versions of the tools.
Clair3 does not call 16 bp at each end of a reference sequence (https://github.com/HKU-BAL/Clair3/issues/257)
Supplement Clair3 calls with Bcftools mpileup/call?