broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.71k stars 594 forks source link

0 or 1 based coordinates. #9029

Open vappiah opened 3 weeks ago

vappiah commented 3 weeks ago

Dear Developers, I recently used UCSC hg38 as a reference for gatk mutect2 variant calling. I plan to annotate the variants using VEP but from this thread, VEP supports 1-based system. My question is that, is the vcf generated by mutect2 0-based or 1-based.

Thanks

lbergelson commented 3 weeks ago

The coordinates of output / input files depend on the file type. In this case it is a 1 based system because VCF is always 1 based.

From the VCF Spec

  1. POS - position: The reference position, with the 1st base having position 1.

Some other formats (ex: BED format) use a 0 based position. GATK reads and writes files in their matching coordinates.

Internally it converts them all to a uniform format for processing. The GATK internal format is 1 based and matches VCF.