The MRAPSS package implements the MR-APSS approach to infer the causal relationship between an exposure and an outcome.
MR-APSS is a unified approach to Mendelian Randomization accounting for Pleiotropy and Sample Structure using genome-wide summary statistics. Specifically, MR-APSS uses a foreground-background model to decompose the observed SNP effect sizes, where the background model accounts for confounding factors hidden in GWAS summary statistics, including correlated pleiotropy and sample structure, and the foreground model performs causal inference while accounting for uncorrelated pleiotropy.
The MR-APSS approach. To infer the causal effect $\beta$ between exposure X and outcome Y, MR-APSS uses a foreground-background model to characterize the estimated effects of SNPs $G_j$ on X and Y ($\hat\gamma_j$ and $\hat\Gammaj$) with standard errors ($\hat s{X,j}$ and $\hat s_{Y,j}$) where the background model accounts for polygenicity, correlated pleiotropy (B) and sample structure (C), and the foreground model (A) aims to identify informative instruments and account for uncorrelated pleiotropy to perform causal inference. D We consider inferring the causal relationship between BMI and T2D as an illustrative example of MR-APSS. The estimated causal effect is indicated by a red line with its 95% confidence interval indicated by the shaded area in transparent red color. Triangles indicate the observed SNP effect sizes ($\hat\gamma_j$ and $\hat\Gamma_j$). The color of triangles indicates the posterior of a valid IV, i.e., the posterior of an IV carrying the foreground signal($Z_j=1$, dark blue) or not ($Z_j=0$, light blue)
#install.packages("devtools")
devtools::install_github("YangLabHKUST/MR-APSS")
We illustrate how to analyze GWAS summary level data using the MR-APSS software by a real example, i.e. BMI (UKB) (exposure) and T2D (outcome). The MR-APSS analysis comprises the following steps:
Step 1: Prepare data and estimate nuisance parameters
Step 2: Fit MR-APSS for causal inference
The tutorial: A real example for performing GWAS summary-level data based MR analysis with MRAPSS package provides details for each step.
To have a quick look at the MR-APSS, you can skip Step 1 and directly jump to Step 2 to fit MR-APSS using the outputs we have prepared.
library(MRAPSS)
exposure = "BMI"
outcome = "T2D"
Threshold = 5e-05 # The default p-value threshold for IV selection
data(C)
data(Omega)
data(MRdat)
MRres = MRAPSS(MRdat,
exposure="BMI",
outcome= "T2D",
C = C,
Omega = Omega ,
Cor.SelectionBias = T)
MRplot(MRres, exposure="BMI", outcome="T2D")
The "BMI~T2D" example with 1227 IVs takes about 1 minute when tested on MAC OS 10.14.6 with 1.4 GHz Intel Core i5,16 GB 2133 MHz LPDDR3 and R version 3.6.1.
Note: In cases where GWAS estimates for the exposure or outcome are severely inflated due to confounding effects, the estimate of C1 or C2 (the LDSC intercepts of the exposure or outcome GWAS) may be very large. MR-APSS, which accounts for the confounding effects that inflate the GWAS estimates, may generate causal effect estimates with larger estimation errors compared to other methods like IVW and RAPS in such cases.
We applied MR-APSS and nine existing summary-level MR methods to (1) test the causal effects of 26 traits on five negative control outcomes (Tanning, Hair color: black, Hair color: blonde; Hair color: dark brown; Hair color: light brown)(130 trait pairs); (2) infer causal relationships between the 26 complex traits (650 trait pairs). In total, there are 780 trait pairs analyzed. We provide source codes for replicating the real data analysis results in the MR-APSS paper.
Data download:
Table of GWAS sources; data(size: 10GB).
Format data:
code; the formatted data(size: 711MB).
Estimate background parameters and plink clumping for the 780 trait pairs:
code for the 130 pairs;code for the 650 pairs; the estimated background parameters(size: 385KB); data for LD clumped sets of IVs(size: 15.1MB).
Real data analysis with negative control outcomes:
code for MR-APSS; results of MR-APSS;
code for eight compared methods; results of compared methods;
code for CAUSE; results of CAUSE;
Visualization of results.
Inferring causal relationships among complex traits:
code for MR-APSS;results of MR-APSS;
code for eight compared methods; results of compared methods;
code for CAUSE; results of CAUSE.
Visualization of results
Q: What are the quality control criteria for GWAS summary statistics in MR-APSS?
A: MR-APSS uses the following quality control criteria to ensure the quality of data:
(1). extract SNPs in HapMap 3 list,
(2). remove SNPs with minor allele frequency < 0.05 (if freq_col column is available),
(3). remove SNPs with alleles not in (G, C, T, A),
(4). remove SNPs with ambiguous alleles (G/C or A/T) or other false alleles (A/A, T/T, G/G or C/C),
(5). remove SNPs with INFO < 0.9 (if infocol column is available),
(6). exclude SNPs in the complex Major Histocompatibility Region (Chromosome 6, 26Mb-34Mb),
(7). remove SNPs with $\chi^2 > \chi^2{max}$ The default value for $\chi^2_{max}>$ is max(N/1000, 80).
Q: Does the allele frequency is required for MR-APSS to format GWAS summary statistics? What if I have columns of allele frequency in the case group ("freq_cases") and allele frequency in the control group ("freq_controls")?
A: Allele frequency ("freq_col"), as well as imputed information ("info_col"), are not required columns in GWAS summary statistics for MR-APSS to obtain the columns in the formatted datasets. To ensure the quality of SNPs, MR-APSS tries to exclude SNPs with MAF < 0.05. If allele frequencies are available in summary statistics, MR-APSS will remove SNPs with sample MAF < 0.05. Considering that sample MAF or imputed information may be missing from the GWAS summary statistics, like LDSC, MR-APSS restricts the analysis to a set of common and well-imputed SNPs in the HapMap 3 reference panel. If both "freq_cases" and "freq_controls" are available, one can obtain an estimation of MAF using the sample MAF of the control group. In MR-APSS, one can specify freq_col = “freq_controls”, or ignore "freq_col" when formatting data.
Q: What sample size do I need for MR-APSS?
A: In MR-APSS, the background model relies on the assumptions of LDSC (LD Score regression), and the parameters in the background model are estimated using LDSC. Following the practice of LDSC, we recommend MR-APSS for analyses of GWASs with more than ~5k samples to avoid inaccurate estimation results.
Q: How does MR-APSS perform LD clumping in real data analysis?
A: In real data analysis, the PLINK LD clumping is used to obtain a subset of nearly independent SNPs as IVs. The default p-value threshold for IV selection for MR-APSS is 5e-05. The squared correlation threshold of clumping ($r^2$) is 0.001.
Q: What exactly does the number "The NO.of valid IVs with foreground signal" reported by MR-APSS mean?
A: The number “NO.of valid IVs with foreground signal” is closely related to the foreground-background model proposed by MR-APSS. Under the foreground-background model, only a proportion of SNPs with foreground signal (the proportion is denoted by $\pi_t$ ) will be used for causal inference. We thus calculated $\hat\pi_t$ * Total NO. of IVs as “NO.of valid IVs with foreground signal”. This number is also known as the effective number of IVs or the estimated number of valid IVs.
Hu Xianghong, Zhao Jia, Lin Zhixiang, Wang Yang, Peng Heng, Zhao Hongyu, Wan Xiang, and Yang Can. Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics. Proceedings of the National Academy of Sciences, 119(28), July 5, 2022. doi: https://www.pnas.org/doi/10.1073/pnas.2106858119.
Please feel free to contact Xianghong Hu (maxhu@ust.hk) or Prof. Can Yang (macyang@ust.hk) if any questions.