Closed lalagkaspn closed 2 years ago
On Wed, 2022-04-27 at 10:21 -0700, Panagiotis-Nikolaos Lalagkas wrote:
Hi community, I am trying to run coloc.abf using eQTLgen data and GWAS summary statistics. For each data set I provide the following data: eQTLgen: snp (rsIDs), snp position, type = "quant", N, MAF, z, pvalues GWAS: beta, varbeta, snp, snp position, type = "quant", N, MAF, pvalues However, I encounter the following errors: 1. When I try to run coloc.abf I get this error: "Error in check_dataset(d = dataset1, 1) : dataset 1: duplicated snps found". Data set 1 is the eQTLgen data. In this data set I have duplicated SNPs because their effect on expression level have been measured on different genes. How can I overcome it? Should I subset it for each gene and make separate coloc.abf runs?
yes, each gene might be under different genetic regulation, so it makes sense to test each gene separately
2. Regarding the GWAS summary statistics, when I run check_dataset(data_coloc_abf$GWAS, warn.minp=1e-10), I get a warning: "In check_dataset(data_coloc_abf$GWAS, warn.minp = 1e-10) : minimum p value is: 0.99406". However, my min(data_coloc_abf[["GWAS"]][["pvalues"]]) = 5.935e-08. Why is this happening?
This warning flags that the p values calculated from beta and varbeta are too large. Did you use se(beta)^2 ?
Thank you in advance for your time! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your instant respose! I calculated the varbeta as varbeta = se^2 * N, because I have the per snp sample size available. However, p_value are already available in the dataset and the minimum p_value is 5.935e-08. Even if I provide the p_value as an element of the list for coloc.abf, it calculates it again using beta and varbeta?
Moreover, when I use a subset of my data with unique SNPs, and try to plot my dataset, I get this error:
> plot_dataset(data_susie$eQTLgen, main = "eQTLgen")
Error in sqrt(d$varbeta) : non-numeric argument to mathematical function
my eQTLgen contains the following information:
Is varbeta not there at all?
https://chr1swallace.github.io
From: Panagiotis Nikolaos Lalagkas @.> Sent: Wednesday, April 27, 2022 11:06:11 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] coloc.abf with eQTLgen and GWAS data errors (Issue #83)
Moreover, when I use a subset of my data with unique SNPs, and try to plot my dataset, I get this error:
plot_dataset(data_susie$eQTLgen, main = "eQTLgen") Error in sqrt(d$varbeta) : non-numeric argument to mathematical function
my eQTLgen contains the following information:
— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F83%23issuecomment-1111528054&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cf8c60d49dc7140e4122908da289a36cb%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637866940050523122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jbYaIh0OHmORS%2BZtgUcSXtHPIeAGkEGt1WwBfde%2BJyw%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2ENSOWIF6N224LNS6DVHG25HANCNFSM5UP2KM2Q&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cf8c60d49dc7140e4122908da289a36cb%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637866940050523122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VsgOqW%2B6uxsdDblNnOMn4BFUMAVw%2Fxc4KiDVggmSnuw%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>
No, it's not. According to the http://chr1swallace.github.io/coloc/articles/a02_data.html, instead of varbeta, I provided the p_value, MAF and sample size.
ok, I need to update the docs. plot_dataset() requires beta, varbeta for now
Thanks for the clarification!
yes, because beta and varbeta contain more information than the p value alone.
varbeta = se^2. No N needed.
On Wed, 2022-04-27 at 11:22 -0700, Panagiotis Nikolaos Lalagkas wrote:
Thank you for your instant respose! I calculated the varbeta as varbeta = se^2 * N, because I have the per snp sample size available. However, p_value are already available in the dataset and the minimum p_value is 5.935e-08. Even if I provide the p_value as an element of the list for coloc.abf, it calculates it again using beta and varbeta? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hi community,
I am trying to run coloc.abf using eQTLgen data and GWAS summary statistics. For each data set I provide the following data:
However, I encounter the following errors:
When I try to run coloc.abf I get this error: "Error in check_dataset(d = dataset1, 1) : dataset 1: duplicated snps found". Data set 1 is the eQTLgen data. In this data set I have duplicated SNPs because their effect on expression level have been measured on different genes. How can I overcome it? Should I subset it for each gene and make separate coloc.abf runs?
Regarding the GWAS summary statistics, when I run check_dataset(data_coloc_abf$GWAS, warn.minp=1e-10), I get a warning: "In check_dataset(data_coloc_abf$GWAS, warn.minp = 1e-10) : minimum p value is: 0.99406". However, my min(data_coloc_abf[["GWAS"]][["pvalues"]]) = 5.935e-08. Why is this happening?
Thank you in advance for your time!