MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
432 stars 176 forks source link

Multivariable MR - P value threshold #78

Open MaVdb opened 5 years ago

MaVdb commented 5 years ago

I want to fit one exposure at a time against the residuals of the outcome that has been adjusted for the other exposures.

When the p-value threshold is changed in mv_residual to include more SNPs, the results do not change.

My exposure dataset consist of SNPs in duplicate, with for each SNP the effect for exposure X and for exposure Y (PHENOTYPE column indicates whether the effect is for X/Y).

_Exposure <- read_exposure_data( filename = 'MultivariableMR.txt', sep = '\t', snp_col = 'SNP', beta_col = 'Effect', se_col = 'SE', effect_allele_col = 'effect_allele', other_allele_col = 'other_allele', phenotype_col = 'PHENOTYPE', units_col = 'units', pval_col = 'P' )

mvdat <- mv_harmonise_data(Exposure, Outcome)

res1 <- mv_residual(mvdat) Warning message: In mv_residual(mvdat) : Up to 0.4.9 there was a problem with the p-value calculation, this has now been fixed

res1 $result exposure nsnp b se pval 1 X 1 0.08711269 0.19380794 0.65308582 2 Y 340 -0.17168225 0.08800509 0.05107824

res2 <- mv_residual(mvdat, pval_threshold = 0.05) Warning message: In mv_residual(mvdat, pval_threshold = 0.05) : Up to 0.4.9 there was a problem with the p-value calculation, this has now been fixed

res2 $result exposure nsnp b se pval 1 X 47 0.08711269 0.19380794 0.65308582 2 Y 340 -0.17168225 0.08800509 0.05107824__

MaVdb commented 5 years ago

When I use mv_ivw function, the p-values change upon changing the pval_threshold. With mv_multiple or mv_residual, this is not the case.

What is the difference between mv_ivw & mv_multiple?

ValeriiaH commented 5 years ago

mv_residual and mv_multiple pval_threshold matters only if instrument_specific=TRUE (it is FALSE by default). For mv_ivw, the instruments are always filtered based on pval_threshold.

In the case of the instrument filtering based on pval_threshold, we could ensure that only strongly associated instruments are fitted into the model, while the other option is more lenient but it could allow us to use more instruments (although the instrument should be strongly associated with at least one exposure).

MaVdb commented 5 years ago

Thanks for your answer!

Thus when using instrument_specific = TRUE, there is no difference anymore between mv_multiple & mv_ivw, if I understand it correctly?

2 additional questions:

1) Regarding the interpretation of the output:

The effect estimates for exposure X are the effect estimates of the multivariable MR controlling for Y? Likewise, effect estimates for exposure Y are controlled for exposure X.

mvmultiple(mvdat, instrument_specific = TRUE, pval_threshold = 1) $result exposure nsnp b se pval 1 X 57 0.60546382 0.4695827 0.1972709 2 Y 57 -0.09483286 0.2022915 0.6392175

2) Before performing the multivariable MR analysis, I clumped the exposure SNPs. As the P-values of association for same SNP differ between exposure X and expsore Y, not exactly the same SNPs are clumped. Is it suggested to clump the same SNP set for both exposures?

Thanks!

MaVdb commented 3 years ago

Dear,

I would like to follow up on my questions regarding mv_multiple vs mv_ivw.

Would it be possible that the number of SNPs depicted in the output when using mv_multiple with default settings (instrument_specific = FALSE) is wrong? As this should be the total number of SNPs that overlap between exposures. Now it is the number of SNPs reaching pval threshold 5e-8 for each exposure.

Thank you! Best, Marijne

mightyphil2000 commented 3 years ago

If you're using 5e-8 to define your instruments, then the number of SNPs should be the total number of SNPs with p<5e-8 for any exposure. This means the SNP does not need to have P<5e-8 for each exposure. So from your description it sounds like you are getting the correct number.

MaVdb commented 3 years ago

Thank you for the quick reply! I have some example outputs to illustrate my last question:

mv_multiple(mvdat)

$result id.exposure exposure id.outcome outcome nsnp b se pval 1 JXsbZV A rvZgZ1 C 587 0.1472363 0.07380856 0.046060366 2 x88S1t B rvZgZ1 C 10 0.3109541 0.10489912 0.003033566

The same results are obtained with mv_ivw(mvdat,pval_threshold=1), however here I have now the total number of SNPs in output $result id.exposure exposure id.outcome outcome nsnp b se pval 1 JXsbZV A rvZgZ1 C 590 0.1472363 0.07380856 0.046060366 2 x88S1t B rvZgZ1 C 590 0.3109541 0.10489912 0.003033566

Therefore I think that the number of SNPs in mv_multiple(mvdat) output should also be the total number of SNPs, as default is instrument_specific = FALSE

mvab commented 3 weeks ago

Hi, this thread has been useful for unpicking the issue I have with my analysis, where one of my exposures in MVMR has instruments with pval > 5e-08.

Here is a summary of what I think happens:

1) when both exposure traits only have instruments with pval < 5e-08, then instrument_specific=F and instrument_specific=T, pval_threshold = 1 produce the same results, and the SNP counts in the nsnp columns in the results table are the same. In both options all provided SNPs are used in the analysis (all are < 5e-08) . 2) when one (or both) exposures have instruments pval > 5e-08, then instrument_specific=F and instrument_specific=T, pval_threshold = 1 would also produce the same results, as still all SNPs will be used. However, with instrument_specific=F the values in nsnp column will be smaller and equal to the number of SNPs with pval < 5e-08 in each trait. But still, all SNPs are used in the analysis (including those with pval > 5e-08). So the nsnp column values are misleading in this analysis and could be misinterpreted. 3) So, when using instrument_specific=T, pval_threshold = 5e-08 to subset the SNPs to a certain threshold (in both exposures), the nsnp column values would be the same as in (2), but actually true in this case, as only those SNPs will be used in the analysis.