JonSulc / PolyMR

Polynomial Mendelian randomization for the inference of non-linear causal effects
GNU General Public License v3.0
13 stars 1 forks source link

PolyMR

Polynomial Mendelian randomization

This package implements the PolyMR method (Sulc et al. 2022)[https://doi.org/10.1016/j.xhgg.2022.100124] for estimating non-linear causal effects of an exposure on an outcome. In short, this Mendelian randomization-based method uses genotypes as instrumental variables (IVs) to model the effect of an exposure on an outcome, correcting for confounding effects using a control function. The outcome is modeled as a sum of two polynomial functions, one for the causal effect and one for the control function.

Installation

You can install PolyMR using the devtools package:

devtools::install_github("JonSulc/PolyMR")

Example

The main function provided by PolyMR is polymr(). For a dataset with n individuals and m independent IVs, the function can be called directly supplying n-length vectors for both exposure and outcome and an n-by-m matrix of genotypes.

library(PolyMR)

polymr(exposure, outcome, genotypes)

By default, the function uses a fixed, 10th degree polynomial for both exposure and control function, with no additional filtering of IVs. The settings used in the article differ from this in that they include filtering out IVs which have a stronger association with the outcome than the exposure and includes a model selection process in which non-significant exposure terms are iteratively eliminated until all remaining coefficients for exposure terms are significant.

We recommend the IV filtering process as any eliminated IVs would violate MR assumptions but recognize this is not necessarily a standard procedure in the field. The model selection process may result in a better fit with fewer terms, however post-selection inference bias will result in underestimated confidence intervals (or, equivalently, overestimated significance). Incorporating either of these procedures should be the result of careful consideration. The settings used in the article can be obtained by passing the following options in the method call:

polymr(exposure, outcome, genotypes, reverse_t_thr = 0, p_thr_drop = NULL)

A value of 0 for reverse_t_thr indicates a simple comparison between effect sizes: if the variance that the IV explains in the outcome is greater than that explained in the exposure, the IV is discarded. An alternative would be to filter out only IVs that explain significantly more variance in the outcome than the exposure, which could be achieved by passing reverse_t_thr = qnorm(0.95) (for nominal significance).

The NULL value passed here for p_thr_drop results in a Bonferroni-corrected threshold, i.e. 0.05 / number of exposure coefficients. A fixed numeric value can be passed instead.

Simulated data

Data simulations can be performed using the (non-exported) new_PolyMRDataSim() function. See ?new_PolyMRDataSim.

For example a setting with an exponential causal function (with coefficient 0.05) can be simulated with:

PolyMR:::new_PolyMRDataSim(sample_size = 10000, causal_function = function(x) 0.05*exp(x))