MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
421 stars 176 forks source link

Inconsistency with GWAS catalog data and data obtained using extract_intruments() #395

Closed RashSar closed 1 year ago

RashSar commented 1 year ago

Hi, I'm using Triglycerides data from multiple studies individually, and obtaining IVs using the extract_intruments() function. I noticed for a particular study (ebi-a-GCST000758), the data obtained from this function is different from the original paper. If the same study is queried through MRInstruments gwas_catalog, the data is different (this is consistent with the original paper).

Here's an example: Data from extract_intruments() - <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

beta | p | se | position | n | chr | id | rsid | ea | nea -- | -- | -- | -- | -- | -- | -- | -- | -- | -- -0.0665 | 8.83E-43 | 0.005 | 63025942 | 96598 | 1 | ebi-a-GCST000758 | rs2131925 | T | G

Data from gwas_catalog: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

SNP | chr | bp_ens_GRCh38 | Region | gene | Gene_ens | effect_allele | other_allele | beta | se | pval | units -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- rs2131925 | 1 | 62560271 | 1p31.3 | DOCK7 |   | G | T | -4.94 | 0.397959 | 9.00E-43 | mg/dL decreas

The effect sizes are very different, for the same study. And not what one might expect from a simple flipping of alt/ref alleles. Here's the same data from original paper ( which is consistent with gwas_catalog) image

I compared gwas_catalog output with extract_instruments() output for a couple of other studies, and they have no issues (other than alt/ref allele switch).

Can anyone identify what might be going wrong here, and how to fix it?

mightyphil2000 commented 1 year ago

Hi RashSar

Could you also lookup rs2131925 in the dataset here? http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST000001-GCST001000/GCST000758/

ebi-a-GCST000758 in Open GWAS was downloaded from the above ftp site. Can you check if the effect size is also different in that dataset? I suspect that the problem may lie there. If the OpenGWAS dataset is the same as the dataset at the above FTP site, then we need to contact the GWAS catalog to understand what is going on.

In addition to the very different betas, it also looks like the effect is in opposite directions.

RashSar commented 1 year ago

Thanks for you response!

http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCST000001-GCST001000/GCST000758/TG_with_Effect.zip

This file has the same effect size as what is fetched from IEU GWAS data using extract_instruments() function. Only the direction is opposite. Allele1 is T, Allele 2 is G. I'm assuming the effect size is for Allele2?

mightyphil2000 commented 1 year ago

It's unclear whether allele1 or allele2 is the effect allele. Is there any metadata at the ftp link that clarifies this? It could be that the modelled effect alleles are different (e.g. T in IEU GWAS and G in the ftp site), in which case there is no conflict between the ftp site and the IEU GWAS.

What's more worrying is that the effect size is very different between the IEU GWAS/FTP site on the one hand and the manually curated top hits (i.e. in your table labelled "Data from gwas_catalog" above).

Could you contact the GWAS catalog about this issue? The ftp site is managed by them and that's ultimately where the discrepancy is coming from. Please could you let us know what they say? Happy to be Cc'd. philip.haycock@bristol.ac.uk

thanks!

RashSar commented 1 year ago

Yes, you're right. The alleles are flipped in the ftp data, which explains the difference in direction.

I will contact GWAS catalog regarding the issue and loop you in.

Thank you!

Edit: My bad, the effect allele is T in both.

RashSar commented 1 year ago

After clarification from GWAS Catalog and Philip, here's a summary of the issue:

The data available via GWAS Catalog is for top associations that are curated directly from the publication. Data at the ftp site is obtained directly from the authors.

In this case, the difference in effect sizes between these two datasets is due to difference in units. Data from ftp site has effect size estimates in standard deviation units. While the effect size in the paper (same as GWAS catalog) is in mg/dL.

Direction of effect is also consistent between GWAS catalog and ftp site. (reported with respect to the minor allele G in the paper, hence its a negative effect. Whereas effect allele is T in ftp site, which has positive effect.)

MRC-IEU data also mentions T as effect allele, but with a negative effect. This flip in direction might to be due an effect allele coding error in the MRC-IEU database.

mightyphil2000 commented 1 year ago

Thanks RashSar

I checked the dataset and it indeed looks like the effect allele is wrong in ebi-a-GCST000758 in Open GWAS. We will remove this dataset. I used the package CheckSumStats to confirm. 96% of known GWAS hits for triglycerides had effect sizes in the opposite to expected direction.

library(CheckSumStats)
ao<-ieugwasr::gwasinfo()
id="ebi-a-GCST000758"
trait<-ao$trait[ao$id == id]
gwas_catalog<-gwas_catalog_hits(efo=NULL,efo_id=NULL,trait=trait,map_association_to_study=FALSE)
out<- ieugwasr::associations(id=id, variants=snplist,proxies=0)
out<-format_data(dat=out,trait=trait,rsid="rsid",effect_allele="ea",other_allele="nea",beta="beta",se="se",eaf="eaf",p="p")
gc_dat<-compare_effect_to_gwascatalog2(dat=out,efo_id=efo,trait=trait,map_association_to_study=FALSE,gwas_catalog=gwas_catalog)
gc_dat[gc_dat$rsid=="rs2131925",]
gc_conflicts<-flag_gc_conflicts2(gc_dat=gc_dat) 
gc_conflicts
$effect_size_conflicts
$effect_size_conflicts$`high conflict`
[1] 457

$effect_size_conflicts$`moderate conflict`
[1] 252

$effect_size_conflicts$`no conflict`
[1] 28

$effect_size_conflicts$n_snps
[1] 737