HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

Wrong FORMAT specification in VCF file #346

Closed simondrue closed 1 week ago

simondrue commented 1 week ago

Hi,

I've encountered a small bug in the formatting of the output VCF files.

In the header of the VCF it says:

##FORMAT=<ID=AF,Number=1,Type=Float,Description="Observed allele frequency in reads, for each ALT allele, in the same order as listed, or the REF allele for a RefCall">

Where Number=1 is specified. As I understand it, this is not correct in the case of a multi allelic sites as the following:

chr1 180847 . C CCCCT,CCT 10.68 PASS F GT:GQ:DP:AD:AF 1/2:10:102:14,11,24:0.1078,0.2353

where AF has the value 0.1078,0.2353, which is a list of multiple floats. I think the correct specification would be Number=G, so that is corresponds to the number of genotypes.

Thanks for a great tool!

aquaskyline commented 1 week ago

Thank you, Simon, for the report.

We checked again, and the latest version of Clair3 uses ID=AF,Number=A for its VCF header, as shown in https://github.com/HKU-BAL/Clair3/blob/9b601b23464699d59a93b0f0bce40444b0dd0cf3/shared/utils.py#L285.

Would you mind letting us know which version of Clair3 you were using?

simondrue commented 1 week ago

I was using 1.0.8. I've updated to 1.0.10 and the issue have been fixed

Sorry for the inconvenience! And thanks again for a great tool 🙏