This repository contains information for QC, analyses, and tutorials relating to the combined HGDP (Human Genome Diversity Project) + 1kGP (1000 Genomes Project) data
Metadata available on Google Cloud: gs://gcp-public-data--gnomad/release/3.1/secondary_analyses/hgdp_1kg_v2/metadata_and_qc/gnomad_meta_updated.tsv
All data are freely available and described in more detail here.
The gnomAD HGDP+1kGP callset (pre-QC mt) can be found here.
.bgz
can be viewed using zcat
on the command lineDatasets used in the tutorials are located here.
Phased haplotypes are available as BCFs on Google Cloud: gs://gcp-public-data--gnomad/resources/hgdp_1kg/phased_haplotypes_v2/
Datasets found on the Downloads page of the gnomAD browser are released on Google Cloud Platform, Amazon Web Services, and Microsoft Azure. Instructions on how to download them can be found here.
PCA plotting and projection scripts available here (used for the COVID-19 Host Genetics Initiative, Global Biobank Meta-analysis Initiative, and related projects to align external cohorts to this resource): https://github.com/atgu/pca_projection/blob/master/hgdp_tgp_reference/hgdp_tgp_pca_intersection.py