HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
234 stars 27 forks source link

Why are these SNPs called as RefCall? #266

Closed weishwu closed 7 months ago

weishwu commented 7 months ago

These SNPs look true on IGV. Why are they labelled as RefCall by Clair3? The AF values in the output VCF are around 0.5.

Screenshot 2024-02-12 at 9 55 01 PM

Another question: Clair3 is to find germline variants. However, my data is amplicon sequencing and may contain mosaic variants whose frequencies can have a wide range. Can Clair3 identify these variants? I don't have tumor-normal pairs so can't use ClairS.

aquaskyline commented 7 months ago

Are you using these options? https://github.com/HKU-BAL/Clair3?tab=readme-ov-file#dealing-with-amplicon-data

weishwu commented 7 months ago

@aquaskyline Thanks! I added those options and they rescued 2 out of the 4 SNPs. However, these 2 SNPs appear true in IGV, but were still labelled as RefCall (they are reported only in pileup.vcf.gz and full_alignment.vcf.gz but not in merge_output.vcf.gz):

C412_SP 2296    .   G   .   16.36   RefCall P   GT:GQ:DP:AD:AF  0/0:16:4515:3614:0.8004
C412_SP 2299    .   C   .   15.97   RefCall P   GT:GQ:DP:AD:AF  0/0:15:4552:3724:0.8181
Screenshot 2024-02-13 at 10 02 15 AM

The variant AF values shown in IGV are higher than in the VCF, which I guess was because part of the reads didn't pass the quality threshold.

My command-line:

# clair3 version: 1.0.5
  run_clair3.sh \
  --bam_fn={input} \
  --ref_fn={params.genome_fasta} \
  --include_all_ctgs \
  --ref_pct_full=1.0 \
  --var_pct_full=1.0 \
  --no_phasing_for_fa \
  --output={params.outdir} \
  --threads={threads} \
  --platform=ont \
  --model_path=r1041_e82_400bps_hac_v410
aquaskyline commented 7 months ago

Could you please show what the records of C412_SP:229 and C412_SP:2299 are like in the full_alignment.vcf.gz file?

weishwu commented 7 months ago
C412_SP 2296    .   G   .   30.17   RefCall F   GT:GQ:DP:AD:AF  0/0:30:11094:7072:0.6375
C412_SP 2299    .   C   .   31.41   RefCall F   GT:GQ:DP:AD:AF  0/0:31:11131:7248:0.6512

Thanks.

aquaskyline commented 7 months ago

Clair3's model has decided with good quality (GQ) that 2296 and 2299 are not a variant, and the reads supporting an alternative allele are more likely to be sequencing or alignment errors.

weishwu commented 7 months ago

@aquaskyline OK. Thanks!