MRCIEU / opengwas-requests

A place to request or contribute data to the IEU GWAS database
3 stars 1 forks source link

differences among OpenGWAS vcf and original sumstats #31

Open matteofloris opened 2 years ago

matteofloris commented 2 years ago

I have noticed a problem with the ebi-a-GCST90001585 dataset from the OpenGWAS database. After downloading the file from OpenGWAS (wget https://gwas.mrcieu.ac.uk/files/ebi-a-GCST90001585/ebi-a-GCST90001585.vcf.gz), I noticed that some variants are not found on the corresponding GWAS catalog sumstats (_wget http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST90001001-GCST90002000/GCST90001585/GCST90001585_buildGRCh37.tsv.gz_):

For example, in the original summary statistics downloaded from the GWAS catalog, the first lines are as follows:

> zless GCST90001585_buildGRCh37.tsv.gz | head -n 6 | cut -f1,2,3,4,5
chromosome base_pair_location effect_allele other_allele N
1 54421 G A 3629
1 54736 T C 3629
1 55326 C T 3629
1 57033 C T 3629
1 57064 A G 3629
1 60249 T C C 3629

whereas in the corresponding VCF file downloaded from OpenGWAS database, the first lines are the following:

> zless ebi-a-GCST90001585.vcf.gz|head -n 200| grep -v "#"|head -n 5
1 10472 rs1307088996 G C
1 10473 rs1408062762 G A
1 10711 rs1434325972 A G
1 12673 rs1476353024 G A
1 13118 rs62028691 G

As you can see, some variants in this OpenGWAS dataset are missing in the original data (for example, position 1:10472 is missing in the GWAS catalog sumstats!!). How is this possible? Thank you for your co-operation.