MRCIEU / TwoSampleMR

R package for performing 2-sample MR using MR-Base database
https://mrcieu.github.io/TwoSampleMR
Other
431 stars 176 forks source link

[BUG]: Does clump_data work for chromosome X? #388

Open mocksu opened 2 years ago

mocksu commented 2 years ago

Please make sure that this is a bug! If you have questions about how to use TwoSampleMR please use the Discussions function instead.

Describe the bug (required)

I have about 100 SNPs for 28 genes to clump. After "clump_data", 1 of the 28 genes "SERPINA7" is removed. Because the error contains "Removing 2 of 2 variants due to LD with other variants or absence from LD reference panel", to make things simple, I created a file with the only 2 SNPs for SERPINA7 as follows:


SNP beta    se  effect_allele   other_allele    Phenotype   chr pval    NOTE    id
rs1804495   -0.112  0.0157727796563968  T   G   SERPINA7    X   1.24e-12    use beta as beta directly from the local file   Mydataset.SERPINA7
rs5916968   -0.112  0.0158231494269504  A   G   SERPINA7    X   1.46e-12    use beta as beta directly from the local file   Mydataset.SERPINA7

Then "format_data", and then "clump_data".

The weird thing is that it always generate the following errors:

Clumping Mydataset.SERPINA7, 2 variants, using EUR population reference
Server code: 503; Server is possibly experiencing traffic, trying again...
Server code: 503; Server is possibly experiencing traffic, trying again...
Server code: 503; Server is possibly experiencing traffic, trying again...
Server code: 503; Server is possibly experiencing traffic, trying again...
fdatServer code: 503; Server is possibly experiencing traffic, trying again...
Server code: 503; Server is possibly experiencing traffic, trying again...
Server error: 503
Failed to retrieve results from server. See error status message in the returned object and contact the developers if the problem persists.
Removing 2 of 2 variants due to LD with other variants or absence from LD reference panel

Please note the Server error. I don't think this is due to the server being down, because 1) the same thing happened a week ago; 2) it only happens to the 2 SERPINA7 SNPs, other SNPs and genes are OK (i.e. no "Server error" and at least 1 SNP is kept for every gene); 3) it tried it several times for SERPINA7 & other genes (separately or together) today and all other genes are always OK, and the SERPINA7 gene is not.

So my question is: why is SERPINA7 so special? The p-value is OK, the eaf is missing but should not be the problem (SNPs for other genes do not have eaf, either; I filled in the eaf for the 2 SNPs and that didn't help).

The only thing that I can think of is that SERPINA7 is on chrX. In the data file above, I used "X" for the chr. I also tried "23" for chrX and the results are the same -- server error and all the 2 snps are removed after clumping.

Provide a clear and concise description of what the bug is See above

Describe the current behaviour you observe (required)

See above

Include code blocks with any error messages See above

Describe the behaviour you expect (required)

See above

R code to reproduce the issue (required)


   library(TwoSampleMR)
    dat = read.csv("serpina7.mr", sep=" ", header=T, comment.char = '#')
    dat = dat[startsWith(dat$SNP,"rs"),] # remove non rsIDs
    dat = dat[!duplicated(dat$SNP, dat$id),] # remove duplicated snps
    fdat = format_data(dat)
    dim(dat)
    dim(fdat)

    cdat = clump_data(fdat, clump_kb = 100, clump_r2 = 0.001000, clump_p1 = 1.000000, clump_p2 = 1.00000
0, pop="EUR")    

Please provide a minimal code snippet that will reproduce this issue see the R code above

Contribute a solution (optional)

Please submit a pull request and/or briefly describe your proposed solution

System information

Please provide details of your operating system and R version I am using ubuntu: 5.15.0-46-generic #49-Ubuntu SMP R: R version 4.1.2 (2021-11-01)

Additional context

Add any other context about the problem here

llalluan commented 1 year ago

Hi, I am experiencing the same issue when using clump_data function