To cite this repo in its current form, and the summary statistics generated, please use this Zenodo release documentation https://zenodo.org/record/8011558
With the re-release of UK Biobank genotype imputation (which we term imputed-v3), we have generated an updated set of GWAS summary statistics for the genetics community.
Information and scripts from the previous round of GWAS are available in the imputed-v2-gwas subdirectory
Finally, the 0.1 and 0.2 script repositories refer to the version of Hail used to run the GWAS
Updates to the Rapid GWAS summary statistics or download Manifest will be recorded here:
30600 - Albumin (g/L)
30610 - Alkaline phosphatase (U/L)
30620 - Alanine aminotransferase (U/L)
30630 - Apoliprotein A (g/L)
30640 - Apoliprotein B (g/L)
30650 - Aspartate aminotransferase (U/L)
30660 - Direct bilirubin (umol/L)
30670 - Urea (mmol/L)
30680 - Calcium (mmol/L)
30690 - Cholesterol (mmol/L)
30700 - Creatinine (umol/L)
30710 - C-reactive protein (mg/L)
30720 - Cystatin C (mg/L)
30730 - Gamma glutamyltransferase (U/L)
30740 - Glucose (mmol/L)
30750 - Glycated haemoglobin (mmol/mol)
30760 - HDL cholesterol (mmol/L)
30770 - IGF-1 (nmol/L)
30780 - LDL direct (mmol/L)
30790 - Lipoprotein A (nmol/L)
30800 - Oestradiol (pmol/L)
30810 - Phosphate (mmol/L)
30820 - Rheumatoid factor (IU/ml)
30830 - SHBG (nmol/L)
30840 - Total bilirubin (umol/L)
30850 - Testosterone (nmol/L)
30860 - Total protein (g/L)
30870 - Triglycerides (mmol/L)
30880 - Urate (umol/L)
30890 - Vitamin D (nmol/L)
30897 - Estimated sample dilution factor (factor)
These phenotypes were added to the list for round 2 late in the process, and were GWASed using Hail 0.2 instead of the Hail 0.1 pipeline used for the rest of the round 2 gwas. The code change from this switch Hail versions is responsible for the different sort order of the output files. Specifically, the tsv export scripts key/sort on "variant" in both pipelines, but the Hail 0.2 version sorts on variant as a constructed string (sorting alphabetically) while the Hail 0.1 version used the variant type which sorts by genomic location. (I'm pretty sure at the time Hail 0.2 hadn't implemented it's analogous locus type associated with genome build, though I haven't gone back to confirm.) The script creating variants.tsv.bgz was part of the primary Hail 0.1 pipeline and so matches that location-based sort used by all of the GWAS outside of the biomarkers.
As far as other releases based on the biomarkers, I haven't found any related issues. Specifically...
Results files for the alternative version of the biomarker GWAS that added dilution fraction as a covariate would probably also be affected by this issue, but those weren't part of the public release (they were just discussed in a blog post) and as far as I know aren't getting any other use.
The ldsc-formatted results files (linked from the h2 results site) are not affected by this issue. They were exported directly from the Hail GWAS results, and so did not involve re-matching with the variants.tsv.bgz in a way that could introduce issues from sort order. The ldsc format isn't sensitive to sort order, and the export scripts end up sorted on rsid the same for both the biomarkers and everything else anyway.
Round 3 / pan-UKB analyses won't be affected by this issue since they're a separate pipeline unrelated to variants.tsv.bgz and without the separate handling of biomarkers.
As far as I can find, the twitter bot hasn't tweeted any of the round 2 biomarker results. The manifest of manhattan plots that had been created from the UKB results that originally fed the bot doesn't have any of the biomarker phenotypes, so I'm guessing they were never in the bot's rotation.
The results being displayed in CTG-View appear unaffected. For example the results for albumin have the expected variant locations as top hits and they are correctly matched to rsids (i.e. matching both the correctly aligned variants.tsv.bgz and canonical dbSNP entry for the locus).
The IEU Open GWAS Project similarly seems to have correct rsids etc in their phewas lookups, dataset summary, and reformatted vcf.
Harder to confirm for other use the GWAS results have gotten/are getting, and of course should make people aware of the issue, but (fingers crossed) it looks like this hasn't caused widespread problems.
January, 2021
Oct 17th, 2019
Oct 9, 2019
Sept 16, 2019
Auto-curated phenotypes using PHESANT:
ICD10 codes (all non-coded individuals treated as controls)
Curated phenotypes in collaboration with the FinnGen consortium
Phenotypes in both sexes
Phenotypes in females
Phenotypes in males
Unique PHESANT phenotypes: 3011, of which 274 are continuous
4203 total unique phenotypes: 3011 PHESANT + 559 finngen + 633 ICD10
Summary files: