genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

filedate header in recent benchmark .vcf files is from 2016 #21

Closed Samvkes closed 1 year ago

Samvkes commented 1 year ago

Hi! I'm trying to use the HG002 benchmark .vcf as a truth set for my variant calling work (after seeing it mentioned in https://doi.org/10.1038/s41587-020-0538-8). When I downloaded the latest version from https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/ I noticed the fileDate in the header was set to '20160824', despite reading about versions from 2020/2021. Am I missing something obvious here or is the header incorrect?

jzook commented 1 year ago

You are correct that the header is incorrect, and v4.2.1 is our latest small variant whole genome benchmark from 2020/2021. Thanks for reporting this!

Samvkes commented 1 year ago

Ah, thx for the quick reply! One more thing, I noticed the same error in the most recent HG005 benchmark set, so this might be a broader issue.