Closed gtollefson closed 8 months ago
Yes it is expected. In the second example, you enabled --gvcf, which incurs more calculations at the reference calls, thus giving you more details.
@aquaskyline Thanks for your response, but I don't think that answers my question since I enabled --gvcf
for both examples - not just the second example. --vcf_fn
is the only difference between the two runs.
Also the output examples I pasted above are both VCF file output (not gvcf files) from runs with --gvcf enabled which are automatically written along with the gvcf files. The gvcf output from the same run with --gvcf and --vcf_fn enabled does show 0/0 calls. I've pasted a screenshot from this run showing that the gvcf output shows 0/0 calls on the same sites that are reported as ./. in the vcf output.
Ref genotype missing at: chr4 748134 in VCF
Ref genotype present at: chr4 748134 in GVCF from same run:
This suggests that 0/0 genotypes and associated DP, GQ, AD values aren't reported in the VCF output (but are reported in the gvcf output) when --vcf_fn
is specified.
We want to use the VCF output in our pipelines, but also need to run clair3 with '--vcf_fn' while outputting 0/0 reference genotypes and associated fields (DP, GQ, AD etc). Is this possible?
@aquaskyline Just following up here. I'm hoping to see if Clair3 can support our use case before we move on to trying out other tools - Clair3 our favorite choice currently but we need to be able to calculate and report DP, AD, etc. for reference calls in non-gvcf output when --vcf_fn
is specified.
Hi, @gtollefson,
The './.' variants indicate low-AF sites that were not processed by Clair3 and thus not subsequently included in the output. To output these variants as RefCall
, you can try setting --snp_min_af=0.0
and --indel_min_af=0.0
in addition to --vcf_fn
.
In the next version, we will set --snp_min_af=0.0
and --indel_min_af=0.0
as the default behavior when '--vcf_fn' is applied.
@gtollefson may we collect your feedback on whether the two settings --snp_min_af=0.0 and --indel_min_af=0.0 solve your problem?
Hi @aquaskyline @zhengzhenxian,
Thank you very much for your help. Setting --snp_min_af=0.0
and --indel_min_af=0.0
along with --vcf_fn
solved my problem. I am able to generate vcfs which report DP, AD, etc. for reference calls in non-gvcf output when --vcf_fn
is specified.
However I just want to point out something interesting. The example site that was missing DP, AD, etc before shows an AF of 0.9094.
The './.' variants indicate low-AF sites that were not processed by Clair3 and thus not subsequently included in the output.
This site had high AF in the AF field. Instead could the reason they were excluded be that the sites had no alternate count in the AD field since they are reference calls?
Screenshot of new vcf output showing this site has high AF but no alternate count in AD field (as is expected for a reference call):
--snp_min_af and --indel_min_af actually refer to variant AF, a site with reference AF 0.9094 is with variant AF <0.1
Ok I see! Thank you. We appreciate your fast response and clean solution. Best, George
Fixed in v1.0.6.
Hi,
When I run the following clair3 command with --vcf_fn and --print_ref_calls provided, reference records report the genotype as ./. and don't report the other format fields (DP, etc.)
VCF output example showing missing ref fields:
When I run a the same clair3 command but without --vcf_fn, the reference genotypes are reported as 0/0 and other format fields are reported.
VCF output example showing present ref fields:
Is this expected behavior? I'd like to be able to run clair3 to report full records for reference calls with GT, GQ, and DP values when --vcf_fn is specified.