MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
417 stars 175 forks source link

Inconsistent unit increase/decrease with beta sign #360

Open cemalley opened 2 years ago

cemalley commented 2 years ago

Hi MRCIEU maintainers, this is Claire Weber with the NEI (NIH) and Dr. Emily Chew. I have been using twosamplemr for about a year and only just encountered this issue recently when using outcome data outside of the opengwas database. I obtained summary stats from FinnGen directly. I can reproduce the problem but there is no error. After running mr, the unit increase/decrease text in the exposure variable is not consistent with the SIGN of the beta value for the test. Here is a sampling of output.... Isn't it correct the sign of the beta should match the direction of the unit change of exposure? Thanks in advance for any help.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

id.exposure | id.outcome | outcome | exposure | method | nsnp | b | se | pval -- | -- | -- | -- | -- | -- | -- | -- | -- 8xMIjp | BGMBwb | AMD | Fasting blood insulin (BMI interaction) (unit decrease) | Simple mode | 7 | 103.8445 | 86.60262 | 0.275695 mAk4Iy | BGMBwb | AMD | Parental longevity (father's age at death) (unit increase) | MR Egger | 3 | 45.37465 | 8.929067 | 0.123697 gpN5HV | BGMBwb | AMD | Glomerular filtration rate in non diabetics (creatinine) (unit increase) | MR Egger | 4 | 30.89913 | 22.26884 | 0.299654 iCjK1B | BGMBwb | AMD | Regular attendance at a gym or sports club (unit decrease) | MR Egger | 3 | -41.5127 | 45.01285 | 0.525739 qYQIpP | BGMBwb | AMD | Regular attendance at a religious group (unit increase) | MR Egger | 5 | -31.9494 | 39.01626 | 0.472853 aqdfcR | BGMBwb | AMD | Fasting blood glucose (BMI interaction) (unit increase) | Simple mode | 13 | -24.3252 | 19.23268 | 0.22996 8tl8bC | BGMBwb | AMD | Regular attendance at a religious group (unit decrease) | MR Egger | 9 | -23.0741 | 19.49309 | 0.275164 aqdfcR | BGMBwb | AMD | Fasting blood glucose (BMI interaction) (unit increase) | Weighted median | 13 | -20.5364 | 17.74627 | 0.247181

explodecomputer commented 2 years ago

Hi there. I'm not quite sure what the origin of the exposure data is here (e.g. Fasting blood insulin (BMI interaction) (unit decrease)) but it resembles the way in which EBI GWAS catalog codes genetic effects - sometimes they want all the betas to be positive and so they change the units to reflect that, which can pose a problem if multiple instruments are available for the same trait but have a mixture of units. The MR estimate is in effect beta.outcome / beta.exposure, so the reason that the exposure unit and the MR sign doesn't match is because the beta.outcome might be negative or positive. In the MR estimate output here you don't see the beta.exposure to which the units are referring.

cemalley commented 2 years ago

Hi Dr. Hemani, all the exposure data is from EBI GWAS, correct. So, can I ignore the units text in the exposure phenotype column and trust that the sign of the MR test result b's are correct? Beta.exposure signs are both negative and positive across instrument SNPs (I think that's normal). The MR results have only positive betas for the fasting blood insulin exposure, but in other exposures and tests, the signs change across the test types. Thank you...I can share the full harmonised dataset and results if you think there is a bug.

explodecomputer commented 2 years ago

Hey if you have multiple instruments for the exposure, and you trust that they are all in terms of the same units (i.e. all increasing exposure or all decreasing exposure), then this seems safe to use for the MR, and the effect estimate will be in terms of the stated outcome units change per stated exposure units.

But if the instruments for a single exposure is a mixture of increasing and decreasing units then I would make sure to harmonise those to all be in terms of the same units before proceeding with the MR.

cemalley commented 2 years ago

Hi Dr. Hemani, how exactly should I change the beta sign or units direction of exposure instruments without corrupting it? Many exposures have both negative and positive effects. Should I separate all negative SNPs from positive SNPs, then have two separate exposures? This is my code and I found I should run clump_data() on the harmonised data to avoid an error, instead of clump exposure/outcome each first. Yes I am running the entire EBI GWAS catalog as exposures. :) And here is the harmonised, clumped data: Allexposures_Finnwet_clumped.csv

library(data.table);library(TwoSampleMR);library(MRPRESSO);library(MRInstruments)
data(gwas_catalog)
exposures <- gwas_catalog
exposures <- format_data(exposures)
exposures <- as.data.table(exposures)
exposures <- exposures[!is.na(beta.exposure) & !is.na(se.exposure) & !is.na(eaf.exposure)&
                         !is.na(effect_allele.exposure) & !is.na(other_allele.exposure) & mr_keep.exposure==TRUE,]

setwd('~/MR/July13MR/')
finnwet <- fread('~/MR/Finn_wet_amd.csv')
finnwet <- finnwet[!is.na(beta.outcome) & !is.na(se.outcome) & !is.na(eaf.outcome)&
                     !is.na(effect_allele.outcome) & !is.na(other_allele.outcome) & mr_keep.outcome==TRUE,]

harmonised <- harmonise_data(exposures, finnwet)
harmonised <- as.data.table(harmonised)
harmonised <- harmonised[mr_keep=='TRUE',]

harmonised_clumped <- clump_data(harmonised)

harmonised_clumped[,exposure:= paste0(exposure, ' (',id.exposure,')')] # I have to do this for the mr_report() to write correctly because some exposures may be the same, but actually from different id.exposure studies. I also removed all "unit increase/decrease" in the exposure column. so it is only "unit" in the reports. 

fwrite(harmonised_clumped, 'Allexposures_Finnwet_clumped.csv')

mr_results <- mr(harmonised_clumped)
harmonised_clumped <- as.data.table(harmonised_clumped)
mr_report(harmonised_clumped)

mr_results <- as.data.table(mr_results)

fwrite(mr_results, 'MR_results_FinnWetAMD.csv')
mocksu commented 1 year ago

Not sure if I understand the issue here correctly, but my question is that the outcome definition is very often suspicious. If the outcome is defined in the opposite direction (e.g. if the outcome is "lung cancer" and you would expect the case to be 1 and control to be 0, but actually the case is by mistake defined as 0 and the control as 1 instead), then the sign of the two sample MR beta would be opposite to what you would expect. Unfortunately, I can't find how the outcome is defined for a particular study.

cemalley commented 1 year ago

@mocksu Yes, I have the same issue. The sign is ambiguous or just wrong considering background knowledge of my disease. This is a serious issue affecting whether my group can trust this package for analysis.

mightyphil2000 commented 1 year ago

Hi @cemalley and @mocksu could you give an example of where the sign is in the wrong direction?

cemalley commented 1 year ago

@mightyphil2000 @mocksu @explodecomputer one example is parental longevity with a negative beta sign for age-related macular degeneration. That does not make sense...results make more sense over all traits if I filter as follows: only MR Egger or IVW tests, test p< 0.05, nsnp > 1, and ignore the unit direction in parentheses in the phenotype description. So I just go by the beta value sign and inspect each SNP, I guess, and ignore the obviously wrong ones biologically.

mightyphil2000 commented 1 year ago

Regarding the questions about the unit descriptions, with units increase and units decrease, this is just an artefact of how the GWAS catalog describes the phenotypes. You should ignore this information. The direction of the exposure betas in the dataset should be correct, unless there is a reporting error in the GWAS catalog. Basically, when we prepare the GWAS catalog dataset for the MRInstruments package, we look at the reported direction from the GWAS catalog and then transform the beta to reflect that. You should interpret your MR result as reflecting a unit increase in the exposure.

I've also run my own analysis using this code to see if I can reproduce what you're finding.

library(TwoSampleMR)
library(MRInstruments)
data(gwas_catalog)

#first define the exposure
exposures <- gwas_catalog
exposures <- format_data(exposures)
exp_dat<-exposures[grep("Parental longevity",exposures$exposure),]
exp_dat2<-exp_dat[exp_dat$mr_keep.exposure,]

#define the outcome using MR-Base
ao<-available_outcomes()
ao1<-ao[grep("macular degeneration",ao$trait),c("id","trait")]
out_dat <- extract_outcome_data(
    snps = unique(exp_dat$SNP),
    outcomes = ao1$id
)
dat <- harmonise_data(
    exposure_dat = exp_dat2, 
    outcome_dat = out_dat
)
res<-mr(dat, method_list = c("mr_ivw"))

Results attached below results.txt

There were 8 exposures in the GWAS with the words parental longevity in the trait description [1] "Parental longevity (both parents in top 10%) (unit increase)"
[2] "Parental longevity (father's age at death) (unit increase)"
[3] "Parental longevity (both parents in top 10%) (unit decrease)"
[4] "Parental longevity (father's attained age) (unit decrease)"
[5] "Parental longevity (combined parental attained age, Martingale residuals) (unit increase)" [6] "Parental longevity (mother's age at death) (unit increase)"
[7] "Parental longevity (combined parental age at death) (unit increase)"
[8] "Parental longevity (combined parental attained age, Martingale residuals) (unit decrease)"

There were 4 outcomes MR-Base with age-related macular degeneration in the trait description (3 are from FinnGen): [1] "Early age-related macular degeneration || id:ebi-a-GCST010723"
[2] "Dry age-related macular degeneration (includes geographic atrophy) || id:finn-b-DRY_AMD" [3] "Age-related macular degeneration (whether dry or wet) || id:finn-b-H7_AMD"
[4] "Wet age-related macular degeneration || id:finn-b-WET_AMD"

Of the 32 possible exposure-outcome combinations in the MR analysis, 31 exposure-outcome analyses were performed. Below are the estimated effects of the exposures on the outcomes with 95% confidence intervals.

res[,c("b","lci","uci")]
           b        lci        uci
1   1.4339749 -1.1609557  4.0289056
2   3.1606026  0.4677737  5.8534315
3   2.9833384 -0.7707297  6.7374065
4   3.7351881 -2.2160612  9.6864373
5   0.2275827 -3.4929881  3.9481534
6   2.5067880 -2.6977465  7.7113225
7   2.0493019 -4.4181535  8.5167574
8   2.3092899 -7.3004246 11.9190044
9   3.0223336 -1.8162879  7.8609551
10  2.3447278 -1.6691903  6.3586459
11  0.1249768 -5.1169189  5.3668725
12 -0.3957120 -1.8828986  1.0914745
13  0.7521391 -1.7378261  3.2421044
14  0.6601985 -1.9717777  3.2921747
15  1.6341283 -1.1253067  4.3935633
16  0.3920942 -0.5595882  1.3437766
17  0.3562984 -1.7052611  2.4178579
18  0.4530038 -1.7175575  2.6235651
19  0.9635278 -1.9004953  3.8275509
20  1.8988225 -0.8563920  4.6540370
21  4.0805320  1.6749859  6.4860782
22  4.3330509  1.7704343  6.8956676
23  5.6101448  0.9709591 10.2493304
24  0.5521051 -1.9263940  3.0306042
25  1.9887816 -1.1886150  5.1661782
26  2.0205115 -1.9433848  5.9844079
27  2.7007920 -3.3315589  8.7331428
28  0.4938893 -0.3424299  1.3302085
29 -0.7219658 -1.8645362  0.4206046
30 -0.8518395 -2.0412527  0.3375737
31 -1.2166547 -3.0454306  0.6121212

The confidence intervals for almost all the results overlap with 0 and seem quite wide, suggesting not much evidence for associations and lots of uncertainty in the direction of the effect.

Some thoughts on why you might get results in the unexpected direction:

  1. The exposure does not have a causal effect on the outcome and what you are observing is a chance result.
  2. Mis-interpretation of the MR results. Like I said above, you should interpret your MR result as reflecting a unit increase in the exposure
  3. Effect allele coding error. Have you perhaps mis-coded or mis-specified the effect allele column in your outcome dataset?
  4. Defining an instrument for a single exposure using multiple studies. Each instrument for each exposure should be defined using a single study. The GWAS catalog might however have results for an exposure that have come from multiple different studies. This could introduce heterogeneity into the results. I don't think this would cause directions of effect to flip into the wrong direction but worth checking anyway.
mightyphil2000 commented 1 year ago

Not sure if I understand the issue here correctly, but my question is that the outcome definition is very often suspicious. If the outcome is defined in the opposite direction (e.g. if the outcome is "lung cancer" and you would expect the case to be 1 and control to be 0, but actually the case is by mistake defined as 0 and the control as 1 instead), then the sign of the two sample MR beta would be opposite to what you would expect. Unfortunately, I can't find how the outcome is defined for a particular study.

Typically the disease will be defined as you expect, with cases coded as 1 and controls as 0. Which study are you struggling to find information for?

mightyphil2000 commented 1 year ago

.... After running mr, the unit increase/decrease text in the exposure variable is not consistent with the SIGN of the beta value for the test. Here is a sampling of output.... Isn't it correct the sign of the beta should match the direction of the unit change of exposure?

No, we do not expect the sign of the beta ( i.e. the b column) in the MR output to match the direction of the unit change of the exposure. The exposure units you see reflect the direction of the exposure effect as reported in the GWAS catalog. However, in the MR analysis, the b column will reflect a unit increase in the exposure (as long as there is no reporting error in the GWAS catalog and the effect allele columns have been specified correctly in the exposure and outcome datasets).

cemalley commented 1 year ago

@mightyphil2000 Thanks very much, your comments and demo example are very helpful. I might have been running an older version of the package also, so I am updating now (confidence intervals in results seems new). And previously, the finngen exposures were not working from the MRInstruments package - I was manually downloading from their cloud storage to use them. I'll try your code exactly.