Discordant output from coloc.abf with beta/varbeta input vs estimation from p-value input

ds763 commented 2 years ago

Hi, thanks for a great set of tools!

When running coloc.abf using beta/varbeta as input I'm getting wildly different outputs compared to when I run using p-values. Incidentally, it's the output I get from the p-value input that corresponds with outputs from hyprcoloc and moloc, for example, so it looks like the issue is arising from the beta/varbeta side. I'm running into the same issue for about 30 datasets, some are meta-analyses and others are single studies - there's nothing systematic there (that I can think of) that would cause the beta/varbetas to decouple from their corresponding p-values.

I'm running coloc v5.1.0.1 using the coloc.abf function with default parameters.

Here's a representative beta/varbeta input:

> str(CAD_data_betavar) List of 7 $ beta : Named num [1:5470] 0.01 0.0452 -0.0101 0.09 -0.1327 ... ..- attr(, "names")= chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ varbeta : Named num [1:5470] 0.00608 0.01704 0.028 0.12921 0.10709 ... ..- attr(, "names")= chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ snp : chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ position: int [1:5470] 9637991 9638010 9638145 9638243 9638338 9638384 9638560 9638918 9639274 9640136 ... $ type : chr "cc" $ N : int 1162920 $ MAF : num [1:5470] 0.22 0.0235 0.0099 0.0016 0.002 ...

When I check the above dataset (see below), it gives me a warning. NOTE: the true min(pvalue) = 2.976e-14.

> check_dataset(CAD_data_betavar) NULL Warning message: In check_dataset(CAD_data_betavar) : minimum p value is: 0.15513 If this is what you expected, this is not a problem. If this is not as small as you expected, please check the 02_data vignette.

Here's the coloc.abf output from the above beta/varbeta dataset and another of the same format:

> coloc.abf(CAD_data_betavar, RDW_data_betavar) PP.H0.abf PP.H1.abf PP.H2.abf PP.H3.abf PP.H4.abf 0.72600 0.25000 0.01670 0.00575 0.00128 [1] "PP abf for shared variant: 0.128%" Coloc analysis of trait 1, trait 2

And here's the output from the same analysis but with p-value input:

> coloc.abf(CAD_data_p, RDW_data_p) PP.H0.abf PP.H1.abf PP.H2.abf PP.H3.abf PP.H4.abf 6.15e-25 2.23e-17 9.14e-10 3.22e-02 9.68e-01 [1] "PP abf for shared variant: 96.8%" Coloc analysis of trait 1, trait 2

Please let me know if you need any additional info.

Many thanks, David

heinin commented 2 years ago

I noticed the same issue. Using p-values, I get results that make sense.

chr1swallace commented 2 years ago

which suggests something is wrong with the non-pvalue data you supply. Do you also get the warning? "One common mistake is to use the standard error of beta in place of the variance of beta. If your dataset provides the standard error, simply square it to get the variance."

From: Heini Natri @.> Sent: 22 September 2022 18:06 To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: Re: [chr1swallace/coloc] Discordant output from coloc.abf with beta/varbeta input vs estimation from p-value input (Issue #100)

I noticed the same issue. Using p-values, I get results that make sense.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F100%23issuecomment-1255310174&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C4c18b5b1b59a4cc54ac208da9cbcec62%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637994632473768785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yOi5ahIQE8NoB%2FN%2BwQTICWzR%2FB5Lm1QHtjU%2FFF2rubg%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BJ4OB3BLTWTRP2PZDV7SG2TANCNFSM6AAAAAAQGPMGQM&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C4c18b5b1b59a4cc54ac208da9cbcec62%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637994632473768785%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Brm9oZsBJEZUHKc5LPIX6%2Fcv4%2FHoPwJI6ESf6WAUVEs%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

chr1swallace commented 1 year ago

did you check the 02_data vignette as the warning suggests? That includes the text "The warning doesn’t mean you can’t run coloc ... but it may alert you to check something if you thought there was strong association there. One common mistake is to use the standard error of beta in place of the variance of beta. If your dataset provides the standard error, simply square it to get the variance."

From: ds763 @.> Sent: 07 September 2022 08:28 To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] Discordant output from coloc.abf with beta/varbeta input vs estimation from p-value input (Issue #100)

Hi, thanks for a great set of tools!

When running coloc.abf using beta/varbeta as input I'm getting wildly different outputs compared to when I run using p-values. Incidentally, it's the output I get from the p-value input that corresponds with outputs from hyprcoloc and moloc, for example, so it looks like the issue is arising from the beta/varbeta side. I'm running into the same issue for about 30 datasets, some are meta-analyses and others are single studies - there's nothing systematic there (that I can think of) that would cause the beta/varbetas to decouple from their corresponding p-values.

I'm running coloc v5.1.0.1 using the coloc.abf function with default parameters.

Here's a representative beta/varbeta input:

str(CAD_data_betavar) List of 7 $ beta : Named num [1:5470] 0.01 0.0452 -0.0101 0.09 -0.1327 ... ..- attr(, "names")= chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ varbeta : Named num [1:5470] 0.00608 0.01704 0.028 0.12921 0.10709 ... ..- attr(, "names")= chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ snp : chr [1:5470] "11:9637991_A_G" "11:9638010_A_C" "11:9638145_C_T" "11:9638243_C_T" ... $ position: int [1:5470] 9637991 9638010 9638145 9638243 9638338 9638384 9638560 9638918 9639274 9640136 ... $ type : chr "cc" $ N : int 1162920 $ MAF : num [1:5470] 0.22 0.0235 0.0099 0.0016 0.002 ...

When I check the above dataset (see below), it gives me a warning. NOTE: the true min(pvalue) = 2.976e-14.

check_dataset(CAD_data_betavar) NULL Warning message: In check_dataset(CAD_data_betavar) : minimum p value is: 0.15513 If this is what you expected, this is not a problem. If this is not as small as you expected, please check the 02_data vignette.

Here's the coloc.abf output from the above beta/varbeta dataset and another of the same format:

coloc.abf(CAD_data_betavar, RDW_data_betavar) PP.H0.abf PP.H1.abf PP.H2.abf PP.H3.abf PP.H4.abf 0.72600 0.25000 0.01670 0.00575 0.00128 [1] "PP abf for shared variant: 0.128%" Coloc analysis of trait 1, trait 2

And here's the output from the same analysis but with p-value input:

coloc.abf(CAD_data_p, RDW_data_p) PP.H0.abf PP.H1.abf PP.H2.abf PP.H3.abf PP.H4.abf 6.15e-25 2.23e-17 9.14e-10 3.22e-02 9.68e-01 [1] "PP abf for shared variant: 96.8%" Coloc analysis of trait 1, trait 2

Please let me know if you need any additional info.

Many thanks, David

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F100&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C21fd138eeba14f09040708da90a29cfc%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637981325330820154%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bHHXDNlfI73%2BvhYzFaOhPzsT2Pi%2Bjjs4MQplZvYf1Gs%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2B2Q57CBHO5H2GDCMDV5A73DANCNFSM6AAAAAAQGPMGQM&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C21fd138eeba14f09040708da90a29cfc%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637981325330820154%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EfEnsdvNC4XzHK8SsEqTuvyh5X%2BXmPerJtjoEJknYF0%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

chr1swallace / coloc

Discordant output from coloc.abf with beta/varbeta input vs estimation from p-value input #100