chr1swallace / coloc

Repo for the R package coloc
139 stars 44 forks source link

coloc.abf with eQTLgen and GWAS data errors #83

Closed lalagkaspn closed 2 years ago

lalagkaspn commented 2 years ago

Hi community,

I am trying to run coloc.abf using eQTLgen data and GWAS summary statistics. For each data set I provide the following data:

However, I encounter the following errors:

  1. When I try to run coloc.abf I get this error: "Error in check_dataset(d = dataset1, 1) : dataset 1: duplicated snps found". Data set 1 is the eQTLgen data. In this data set I have duplicated SNPs because their effect on expression level have been measured on different genes. How can I overcome it? Should I subset it for each gene and make separate coloc.abf runs?

  2. Regarding the GWAS summary statistics, when I run check_dataset(data_coloc_abf$GWAS, warn.minp=1e-10), I get a warning: "In check_dataset(data_coloc_abf$GWAS, warn.minp = 1e-10) : minimum p value is: 0.99406". However, my min(data_coloc_abf[["GWAS"]][["pvalues"]]) = 5.935e-08. Why is this happening?

Thank you in advance for your time!

chr1swallace commented 2 years ago

On Wed, 2022-04-27 at 10:21 -0700, Panagiotis-Nikolaos Lalagkas wrote:

Hi community, I am trying to run coloc.abf using eQTLgen data and GWAS summary statistics. For each data set I provide the following data:   eQTLgen: snp (rsIDs), snp position, type = "quant", N, MAF, z, pvalues   GWAS: beta, varbeta, snp, snp position, type = "quant", N, MAF, pvalues However, I encounter the following errors:    1. When I try to run coloc.abf I get this error: "Error in check_dataset(d = dataset1, 1) :       dataset 1: duplicated snps found". Data set 1 is the eQTLgen data. In this data set I have duplicated SNPs because their effect on expression level have been measured on different genes. How can I overcome it? Should I subset it for each gene and make separate coloc.abf runs?

yes, each gene might be under different genetic regulation, so it makes sense to test each gene separately

   2. Regarding the GWAS summary statistics, when I run check_dataset(data_coloc_abf$GWAS, warn.minp=1e-10), I get a warning: "In check_dataset(data_coloc_abf$GWAS, warn.minp = 1e-10) :       minimum p value is: 0.99406". However, my min(data_coloc_abf[["GWAS"]][["pvalues"]]) = 5.935e-08. Why is this happening?

This warning flags that the p values calculated from beta and varbeta are too large. Did you use se(beta)^2 ?

Thank you in advance for your time! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

lalagkaspn commented 2 years ago

Thank you for your instant respose! I calculated the varbeta as varbeta = se^2 * N, because I have the per snp sample size available. However, p_value are already available in the dataset and the minimum p_value is 5.935e-08. Even if I provide the p_value as an element of the list for coloc.abf, it calculates it again using beta and varbeta?

lalagkaspn commented 2 years ago

Moreover, when I use a subset of my data with unique SNPs, and try to plot my dataset, I get this error: > plot_dataset(data_susie$eQTLgen, main = "eQTLgen") Error in sqrt(d$varbeta) : non-numeric argument to mathematical function

my eQTLgen contains the following information:

  1. snp (rsIDs)
  2. position (of snp)
  3. type ("quant")
  4. N (single number)
  5. MAF (named with rsIDs)
  6. z (named with rsIDs)
  7. pvalues (named with rsIDs)
chr1swallace commented 2 years ago

Is varbeta not there at all?

https://chr1swallace.github.io


From: Panagiotis Nikolaos Lalagkas @.> Sent: Wednesday, April 27, 2022 11:06:11 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] coloc.abf with eQTLgen and GWAS data errors (Issue #83)

Moreover, when I use a subset of my data with unique SNPs, and try to plot my dataset, I get this error:

plot_dataset(data_susie$eQTLgen, main = "eQTLgen") Error in sqrt(d$varbeta) : non-numeric argument to mathematical function

my eQTLgen contains the following information:

  1. snp (rsIDs)
  2. position (of snp)
  3. type ("quant")
  4. N (single number)
  5. MAF (named with rsIDs)
  6. z (named with rsIDs)
  7. pvalues (named with rsIDs)

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F83%23issuecomment-1111528054&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cf8c60d49dc7140e4122908da289a36cb%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637866940050523122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jbYaIh0OHmORS%2BZtgUcSXtHPIeAGkEGt1WwBfde%2BJyw%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2ENSOWIF6N224LNS6DVHG25HANCNFSM5UP2KM2Q&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cf8c60d49dc7140e4122908da289a36cb%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637866940050523122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VsgOqW%2B6uxsdDblNnOMn4BFUMAVw%2Fxc4KiDVggmSnuw%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

lalagkaspn commented 2 years ago

No, it's not. According to the http://chr1swallace.github.io/coloc/articles/a02_data.html, instead of varbeta, I provided the p_value, MAF and sample size.

chr1swallace commented 2 years ago

ok, I need to update the docs. plot_dataset() requires beta, varbeta for now

lalagkaspn commented 2 years ago

Thanks for the clarification!

chr1swallace commented 1 year ago

yes, because beta and varbeta contain more information than the p value alone.

varbeta = se^2. No N needed.

On Wed, 2022-04-27 at 11:22 -0700, Panagiotis Nikolaos Lalagkas wrote:

Thank you for your instant respose! I calculated the varbeta as varbeta = se^2 * N, because I have the per snp sample size available. However, p_value are already available in the dataset and the minimum p_value is 5.935e-08. Even if I provide the p_value as an element of the list for coloc.abf, it calculates it again using beta and varbeta? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>