MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
418 stars 176 forks source link

[BUG]: ld_matrix run locally does not align alleles to 1000G phase 3 reference data #420

Open JonnyBaseball opened 1 year ago

JonnyBaseball commented 1 year ago

Please make sure that this is a bug! If you have questions about how to use TwoSampleMR please use the Discussions function instead.

Describe the bug (required)

Alleles in ld matrix run locally do not all align with 1000G phase 3 reference data (I am using EUR population); alleles in ld matrix run through your server do align with reference data; e.g. out of 500, 7 discrepancies Provide a clear and concise description of what the bug is

Describe the current behaviour you observe (required)

No error message; I aligned alleles with 1000G phase 3 reference data (EUR pop), flipped betas (where alleles realigned); I ran both local ld matrix function and ld matrix through your server with realigned data, with "alleles=TRUE". I checked the order of the alleles from the realigned data against the ld matrices. While the order of alleles in the ld matrix run through your server were the same, there were 7 (of 499) SNPs from the local ld matrix misaligned (did not match order in 1000G phase 3 reference)

Include code blocks with any error messages No error messages (other then recommend using local functionality). dat is aligned data with flipped betas.

 cor_matrix <- ld_matrix(snps = dat$SNP, with_alleles = TRUE, pop = "EUR")
 ##  compare to local
  cor.matrix.local <- ieugwasr::ld_matrix(
    variants = dat$SNP,
    plink_bin = genetics.binaRies::get_plink_binary(),
    bfile = "/Users/XXXXX/Desktop/LDREF/EUR", with_alleles = TRUE)

Describe the behaviour you expect (required)

I expected the local function to generate ld matrix with alleles aligned to 1000G phase 3 reference, as the function run through the mrcieu server seems to do.

R code to reproduce the issue (required)

Please provide a minimal code snippet that will reproduce this issue

cor_matrix <- ld_matrix(snps = dat$SNP, with_alleles = TRUE, pop = "EUR")
 ## local
  cor.matrix.local <- ieugwasr::ld_matrix(
    variants = dat$SNP,
    plink_bin = genetics.binaRies::get_plink_binary(),
    bfile = "/Users/XXXXX/Desktop/LDREF/EUR", with_alleles = TRUE)

I can send "dat" to try this.

Contribute a solution (optional)

Please submit a pull request and/or briefly describe your proposed solution

System information

Please provide details of your operating system and R version macOS Catalina v 10.15.7 (intel), R v 4.2.0

Additional context

I was using ld matrix to carry forward to SuSIE colocalization and possibly correlated MR.

Add any other context about the problem here

sskimb commented 1 year ago

I had the same issue when I try to convert TwoSampleMR data file to a MendelianRandomization object using dat_to_MRInput, which internally calls ld_matrix (which in turn calls ieugwasr::ld_matrix) and checks the result by calling harmonise_ld_dat. I have inserted a few lines to reorder columns and rows of the ld matrix to match the SNP order of exposure-outcome data.frame. I have named it as dat2MRInput to distinguish it from the distributed dat_to_MRInput. You also need to provide the specific "bfile" and "plink_bin" to run locally (this feature is not available in the distributed code). Here is the code:


dat2MRInput <- function(dat, get_correlations=FALSE, pop="EUR", bfile=NULL, plink_bin=NULL)
{
        out <- plyr::dlply(dat, c("exposure", "outcome"), function(x)
        {
                x <- plyr::mutate(x)
                message("Converting:")
                message(" - exposure: ", x$exposure[1])
                message(" - outcome: ", x$outcome[1])
if(get_correlations)
                {
                        message(" - obtaining LD matrix")
                        ld <- ieugwasr::ld_matrix(variants=unique(x$SNP), with_alleles=TRUE, pop=pop, bfile=bfile,
plink_bin=plink_bin)
                        # plink reorders the SNPs
                        snpnames <- do.call(rbind, strsplit(rownames(ld), split="_"))[,1]
MD <- match( x$SNP, snpnames )
                        ld <- ld[ MD, MD ]
                        out <- harmonise_ld_dat(x, ld)
                        if(is.null(out))
                        {
                                return(NULL)
                        }
                        x <- out$x
                        ld <- out$ld

                        MendelianRandomization::mr_input(
                                bx = x$beta.exposure,
                                bxse = x$se.exposure,
... omitted

Sincerely,

Sangsoo Kim
laleoarrow commented 5 months ago

Im calculating the ld matrix to carry forward to SuSIE and I have been stuck here for a while. Is there a final answer yet?