Nealelab / UK_Biobank_GWAS

Overview of the data QC, code, and GWAS summary output from the 2017 UK Biobank data release
345 stars 107 forks source link

Inconsistency in MD5 Checksums for imputed v3 GWAS sumstats #54

Closed xinyixinyijiang closed 7 months ago

xinyixinyijiang commented 8 months ago

Hi!

I recently downloaded GWAS summary statistics for almost all UKBB phenotypes (both sexes only) via the AWS link found in the provided Google document. To ensure data integrity, I conducted an MD5 checksum verification (on Linux system). However, I encountered a recurring issue with 22 specific phenotypes failing the MD5 checksum validation across two separate tests.

For example, for phenotype ID 1220, accessed through the link https://broad-ukb-sumstats-us-east-1.s3.amazonaws.com/round2/additive-tsvs/1220.gwas.imputed_v3.both_sexes.tsv.bgz, the MD5 checksum I computed was 5ee1df62d2a6608c942ec85fd712bf9a. This differs from the expected checksum provided, which is 80d2c21a425aee154d585cd20ffa1e8c.

I have listed all affected phenotypes below for your reference. Could you please investigate this discrepancy?

Thank you for your attention to this matter!

135 Number of self-reported non-cancer illnesses
136 Number of operations, self-reported
137 Number of treatments/medications taken
398 Number of correct matches in round
403 Number of times snap-button pressed
709 Number in household
924 Usual walking pace
1160 Sleep duration
1190 Nap during day
1200 Sleeplessness / insomnia
1220 Daytime dozing / sleeping (narcolepsy)
1239 Current tobacco smoking
1289 Cooked vegetable intake
1518 Hot drink temperature
1548 Variation in diet
1687 Comparative body size at age 10
1697 Comparative height size at age 10
1873 Number of full brothers
1883 Number of full sisters
2237 Plays computer games
2296 Falls in the last year
2306 Weight change compared with 1 year ago
howrigan commented 8 months ago

Thanks for bringing this to our attention. We have identified that many of the additive files were updated prior to the migration to AWS, however the md5s were not updated. We are working on generating the md5s and should have this completed by the end of the week.

howrigan commented 7 months ago

Thanks for your patience here. We've updated the file list, and generated the complete list of MD5 for all files in this repository. This can be found in 2018_gwas_imputed_md5.20240212.txt. Sorry for any confusion this may have created.

Details about these GWAS sumstats in question are here