chr1swallace / coloc

Repo for the R package coloc
144 stars 44 forks source link

Preparing data for 'CC' and 'quant' traits, tried but getting error #81

Closed zillurbmb51 closed 2 years ago

zillurbmb51 commented 2 years ago

Hi,

I wanted to perform co-localization test for my traits. I found this tool is perfect for research. I was trying the following ways to prepare my data but always getting error. The 'SCZ' is 'CC' and 'BASOPHIL' is 'quant' Here is my codes, a little mess though! and the example data. Any suggestion?

scz1=fread('/home/ubuntu/immunesystem_md/sumstats/PGC_SCZ_0518_EUR.sumstats') scz1=scz1[order(CHR,BP),] scz1$type='cc' check_dataset(scz1) scz2=scz1 head(scz1) baso1=fread('/home/ubuntu/immunesystem_md/from_nadine/sumstats/BCX2_BASOPHIL_UKB_zscore.csv') head(baso1) scz1$type='cc' check_dataset(scz1) names(scz1) scz2=scz1 names(scz2) names(scz2)=c('SNP','CHR','BP','pvalues','A1','A2','N','NCASE','NCONTROL','Z','OR','SE','INFO','VARIANT_ID','type') check_dataset(scz2) head(scz2) str(scz2) names(scz2) names(scz2)[10]='beta' names(scz2)[12]='varbeta' check_dataset(scz2) plot_dataset(scz2) scz3=D1[c('beta','varbeta','SNP','CHR','BP','type')] str(scz2) scz3=as.list(scz2) plot_dataset(scz3) ?plot_dataset check_dataset(scz3,warn.minp = 1e-10) plot_dataset(scz3) scz3=data(scz2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso1,p12 = 1e-6) str(scz3) scz3=as.list(scz2) str(scz3) baso2=as.list(baso1) str(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso2,p12 = 1e-6) baso2=baso1[complete.cases(baso1),] baso2$type='quant' baso3=as.list(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso3) head(scz3) str(baso3) head(baso1) head(baso2) str(baso1) names(baso1) baso2=baso1[,c(1:6,8:10)] head head(baso2) baso3=baso2[complete.cases(baso2),] head(baso3) dim(baso3) baso3=as.list(baso3) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) names(scz3) names(baso3) names(baso3)[4]=pvalues names(baso3)[4]='pvalues' names(baso3)[7]='beta' names(baso3)[8]='varbeta' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$type='quant' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$sdY=1 my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso2) baso2=baso2[complete.cases(baso2),] head(baso2) head(baso2,100) write.table(head(baso2,200),file = '/home/ubuntu/immunesystem_md/from_nadine/baso_for_coloc.csv',row.names = F,sep = '\t',quote = F) write.table(head(scz1,200),file = '/home/ubuntu/immunesystem_md/from_nadine/scz_for_coloc.csv',row.names = F,sep = '\t',quote = F)

The last error:

Error in process.dataset(d = dataset1, suffix = "df1") : dataset df1: please give s, proportion of samples who are cases, if using p values In addition: Warning messages: 1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 3: In if (d$type == "cc" & "pvalues" %in% nd) { : the condition has length > 1 and only the first element will be used

baso_for_coloc.csv scz_for_coloc.csv

chr1swallace commented 2 years ago

The error says your dataset doesn't have the element "s". Does reading this help? http://chr1swallace.github.io/coloc/articles/a02_data.html

https://chr1swallace.github.io


From: zillurbmb51 @.> Sent: Saturday, April 9, 2022 3:41:00 AM To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] Preparing data for 'CC' and 'quant' traits, tried but getting error (Issue #81)

Hi,

I wanted to perform co-localization test for my traits. I found this tool is perfect for research. I was trying the following ways to prepare my data but always getting error. The 'SCZ' is 'CC' and 'BASOPHIL' is 'quant' Here is my codes, a little mess though! and the example data. Any suggestion?

scz1=fread('/home/ubuntu/immunesystem_md/sumstats/PGC_SCZ_0518_EUR.sumstats') scz1=scz1[order(CHR,BP),] scz1$type='cc' check_dataset(scz1) scz2=scz1 head(scz1) baso1=fread('/home/ubuntu/immunesystem_md/from_nadine/sumstats/BCX2_BASOPHIL_UKB_zscore.csv') head(baso1) scz1$type='cc' check_dataset(scz1) names(scz1) scz2=scz1 names(scz2) names(scz2)=c('SNP','CHR','BP','pvalues','A1','A2','N','NCASE','NCONTROL','Z','OR','SE','INFO','VARIANT_ID','type') check_dataset(scz2) head(scz2) str(scz2) names(scz2) names(scz2)[10]='beta' names(scz2)[12]='varbeta' check_dataset(scz2) plot_dataset(scz2) scz3=D1[c('beta','varbeta','SNP','CHR','BP','type')] str(scz2) scz3=as.list(scz2) plot_dataset(scz3) ?plot_dataset check_dataset(scz3,warn.minp = 1e-10) plot_dataset(scz3) scz3=data(scz2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso1,p12 = 1e-6) str(scz3) scz3=as.list(scz2) str(scz3) baso2=as.list(baso1) str(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso2,p12 = 1e-6) baso2=baso1[complete.cases(baso1),] baso2$type='quant' baso3=as.list(baso2) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso3) head(scz3) str(baso3) head(baso1) head(baso2) str(baso1) names(baso1) baso2=baso1[,c(1:6,8:10)] head head(baso2) baso3=baso2[complete.cases(baso2),] head(baso3) dim(baso3) baso3=as.list(baso3) my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) names(scz3) names(baso3) names(baso3)[4]=pvalues names(baso3)[4]='pvalues' names(baso3)[7]='beta' names(baso3)[8]='varbeta' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$type='quant' my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) baso3$sdY=1 my_res=coloc.abf(dataset1 = scz3,dataset2 = baso3,p12 = 1e-6) head(baso2) baso2=baso2[complete.cases(baso2),] head(baso2) head(baso2,100) write.table(head(baso2,200),file = '/home/ubuntu/immunesystem_md/from_nadine/baso_for_coloc.csv',row.names = F,sep = '\t',quote = F) write.table(head(scz1,200),file = '/home/ubuntu/immunesystem_md/from_nadine/scz_for_coloc.csv',row.names = F,sep = '\t',quote = F)

The last error:

Error in process.dataset(d = dataset1, suffix = "df1") : dataset df1: please give s, proportion of samples who are cases, if using p values In addition: Warning messages: 1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 3: In if (d$type == "cc" & "pvalues" %in% nd) { : the condition has length > 1 and only the first element will be used

baso_for_coloc.csvhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Ffiles%2F8456089%2Fbaso_for_coloc.csv&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C56c45e3a441944cf685908da19d263e7%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637850688649506361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=dEurqXx%2BCNESNBnd1ABldOpHoyt8WnhVSxdP7lmCsc0%3D&reserved=0 scz_for_coloc.csvhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Ffiles%2F8456090%2Fscz_for_coloc.csv&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C56c45e3a441944cf685908da19d263e7%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637850688649506361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nXtLbYLun0T4XxyTfFha9tfEIdeeLshYQecW%2FlgnpfA%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F81&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C56c45e3a441944cf685908da19d263e7%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637850688649506361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=CRvXyQ8DxvZFVnHSNqeqHcQUnAmJfDmlMbT44kPovHI%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2GRABK2UTUV2WHQEZTVEDU3ZANCNFSM5S6EH3DA&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C56c45e3a441944cf685908da19d263e7%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637850688649506361%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=BW43YF0Vi7eAIOV89IW5NLRiFxgsrpT9DhExVdIvGL0%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

zillurbmb51 commented 2 years ago

Thank you.

The reading was good. Though for my dataset, I am little confused how to prepare them (attached is the first few lines of the original sumstat).I added a column "s" approximate proportion of cases in the dataset. Now I am getting the following error:

Error in process.dataset(d = dataset1, suffix = "df1") : 
  dataset df1: please give MAF if using p values
In addition: Warning messages:
1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix,  :
  the condition has length > 1 and only the first element will be used
2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix,  :
  the condition has length > 1 and only the first element will be used
3: In if (d$type == "cc" & "pvalues" %in% nd) { :
  the condition has length > 1 and only the first element will be used

I do not have "MAF" in my datasets. That is why I sent you the original sumstats. Do I need to rename the columns as in the vignette, such as, "PVAL" to "pvalues" or "SNP" to "snp" or "BP" to "position" ?

Could you please just take a look at my dataset and give me some advice about how to prepare them for the coloc.abf input?

baso_for_coloc.csv scz_for_coloc.csv

chr1swallace commented 2 years ago

Please see the section " What if I don’t have beta and/or varbeta?" On the link above

https://chr1swallace.github.io


From: zillurbmb51 @.> Sent: Saturday, April 9, 2022 12:20:54 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Preparing data for 'CC' and 'quant' traits, tried but getting error (Issue #81)

Thank you.

The reading was good. Though for my dataset, I am little confused how to prepare them (attached is the first few lines of the original sumstat).I added a column "s" approximate proportion of cases in the dataset. Now I am getting the following error:

Error in process.dataset(d = dataset1, suffix = "df1") : dataset df1: please give MAF if using p values In addition: Warning messages: 1: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 2: In if (!(d$type %in% c("quant", "cc"))) stop("dataset ", suffix, : the condition has length > 1 and only the first element will be used 3: In if (d$type == "cc" & "pvalues" %in% nd) { : the condition has length > 1 and only the first element will be used

I do not have "MAF" in my datasets. That is why I sent you the original sumstats. Do I need to rename the columns as in the vignette, such as, "PVAL" to "pvalues" or "SNP" to "snp" or "BP" to "position" ?

Could you please just take a look at my dataset and give me some advice about how to prepare them for the coloc.abf input?

baso_for_coloc.csvhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Ffiles%2F8456796%2Fbaso_for_coloc.csv&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C62bf921be2ce40080e3708da1a1b0489%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637851000582875398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ihdFOPopkcF7bS0GBuXAi16Qyuw315B0dll8%2BO1%2BxTw%3D&reserved=0 scz_for_coloc.csvhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Ffiles%2F8456798%2Fscz_for_coloc.csv&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C62bf921be2ce40080e3708da1a1b0489%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637851000582875398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ZOqpUhTI7duOTXE7L2Cl7%2F3DjHYHmwQcrCTY454OyjY%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F81%23issuecomment-1093929477&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C62bf921be2ce40080e3708da1a1b0489%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637851000582875398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1xCJXonpxiXqurWXSY8MiDeCD6mwS5WYojjiFql6yx0%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2FFMB4DWU4T233QQPTVEFRZNANCNFSM5S6EH3DA&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C62bf921be2ce40080e3708da1a1b0489%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637851000582875398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=c2P5PnyjwDB4Nlj7stRLucFnOdWVUJ5m2kf%2FJqiIuqs%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

zillurbmb51 commented 2 years ago

Hi,

Thank you for the reply. Unfortunately for me, it did not help. It was asking for "MAF" which I do not have in my data set. I have two data sets, one data does not have "beta" or "MAF". What should I do?

Here is the quote from vignette:

But if you don’t have them, coloc can estimate them, given p values, MAF, sample size and, if case-control data, the fraction of samples that are cases:

Sorry for bothering you too much. Could you please just take a look at my dataset's column names and give me some advice? two_datasets_for_coloc

chr1swallace commented 2 years ago

Well you have beta and SE. SE squared is varbeta. Use those.

https://chr1swallace.github.io


From: zillurbmb51 @.> Sent: Monday, April 11, 2022 12:02:54 AM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Preparing data for 'CC' and 'quant' traits, tried but getting error (Issue #81)

Hi,

Thank you for the reply. Unfortunately for me, it did not help. It was asking for "MAF" which I do not have in my data set. I have two data sets, one data does not have "beta" or "MAF". What should I do?

Here is the quote from vignette:

But if you don’t have them, coloc can estimate them, given p values, MAF, sample size and, if case-control data, the fraction of samples that are cases:

Sorry for bothering you too much. Could you please just take a look at my dataset's column names and give me some advice? [two_datasets_for_coloc]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F12833907%2F162643620-fc5c4b49-df42-480b-98f9-d770ba79888b.png&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C5e7f0c93f48a4fba88c408da1b46529e%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637852286087900515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=j4rvhum9lwE8L00zbkKaCsoVYAmbL8y9NDSQoiZbjyw%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F81%23issuecomment-1094402153&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C5e7f0c93f48a4fba88c408da1b46529e%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637852286087900515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=G2BXMMHnzjF1RFjipf6Fac%2BhJcfenuHP2jGDOhEiVHQ%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2DE4CUYDWHNM6PUW73VENMZ5ANCNFSM5S6EH3DA&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C5e7f0c93f48a4fba88c408da1b46529e%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637852286087900515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=cwgee7qLZKc9LEvBzfGuAgWEsd19RYRXNE7EWiTvztk%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

zillurbmb51 commented 2 years ago

Thanks a lot.

Yes, for the "baso2" I have beta. But for the "scz1" there is no beta.

Should I use "Z" as beta?

chr1swallace commented 2 years ago

Z=beta/se. So no

https://chr1swallace.github.io


From: zillurbmb51 @.> Sent: Monday, April 11, 2022 10:19:51 AM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Preparing data for 'CC' and 'quant' traits, tried but getting error (Issue #81)

Thanks a lot.

Yes, for the "baso2" I have beta. But for the "scz1" there is no beta.

Should I use "Z" as beta?

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F81%23issuecomment-1094792562&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C84638a8df66949fea09e08da1b9c81e3%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637852656246853973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Ngv5xYTnIkm3O4HI6CVwlQ6%2FWZqxG24HGpnCM6exmyk%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2HMZGYQWDZGY7A5WNLVEPVDPANCNFSM5S6EH3DA&data=04%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C84638a8df66949fea09e08da1b9c81e3%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637852656246853973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HW8jMhsjzwagv1bXHKbOBHdH8ldoLLLRKGFAuBMDYfk%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

zillurbmb51 commented 2 years ago

Thanks a lot. It works now. I create beta column: beta=Z*se

Also I am getting this error:


> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"

I have restarted the rstudio, reinstalled coloc but it persists. Any help?

zillurbmb51 commented 2 years ago

Also I am getting this error:

> coloc.detail()
Error in coloc.detail() : could not find function "coloc.detail"

I have restarted the rstudio, reinstalled coloc but it persists. Any help?

chr1swallace commented 2 years ago

coloc.detail has been removed from most recent version