genome-in-a-bottle / giab_data_indexes

This repository contains data indexes from NIST's Genome in a Bottle project.
232 stars 71 forks source link

MD5 checksum value error #28

Closed jberghout closed 9 months ago

jberghout commented 9 months ago

I just downloaded the Illumina WES data from the Ashkenazi trio: https://github.com/genome-in-a-bottle/giab_data_indexes/blob/master/AshkenazimTrio/alignment.index.AJtrio_OsloUniversityHospital_IlluminaExome_bwamem_GRCh37_11252015

The link above has some errors in the MD5 checksum data presented on the GIAB github table. Minor, but thought worth drawing to your attention, if you wanted to verify (and correct?)

On the GIAB github page:

  1. the bam & bai file for HG002 currently both show the same MD5 hash (= c80f0cab24bfaa504393457b8f7191fa).
    • In my download, that hash matches the .bam file, but the .bai file comes up as d4fea426c3e2e9a71bb92e6526b4df6f
  2. the bai file for HG004 shows MD5 hash = 8914bfb6fa6bd304192f2c9e13e903f.
    • In my download, the hash comes up as 8914bfb6fa6bd304192f2c9e13e903f4. Close enough to guess that the website text was probably inadvertently truncated by a digit, rather than an actual mismatch.
chunlinxiao commented 9 months ago

HG002/4 fai md5 fixed - Thanks @jberghout