chr1swallace / coloc

Repo for the R package coloc
144 stars 44 forks source link

res$summary shows PP.H2.abf=1 AND PP.H4.abf=1 #105

Closed zainomarali closed 1 year ago

zainomarali commented 1 year ago

Some background on what I am doing - due to the nature of our study a custom approach was taken to construct credible sets (i.e. without using SusieR). The approach we took gives us quite narrow credible sets and we have Bayes factors for all the variants in each credible set.

We are studying 9 different traits and we expect them to have some overlapping loci. We now have a long list of partially overlapping credible sets from different trait pairs. I wanted to use coloc to determine if the partially overlapping sets are truly colocalising or not.

We have Bayes factors for the variants in our credible set so I take the following approach:

1) create vectors of Bayes factors for the 2 credible sets 2) subset to only those variants that are shared between the credible sets

This gives 2 equal length vectors of Bayes factors for the 2 loci, but I have had to throw away non-intersecting variants (I'm not sure if this is a problem?).

Now it is trivial to simply apply coloc.bf_bf to these 2 vectors. In many cases this gives answers that make sense upon manual inspection.

But for a large number of cases the results summary shows that:

(PP.H2.abf=1 OR PP.H1.abf=1) AND PP.H4.abf=1.

How is this possible? How can 2 mutually exclusive hypotheses be completely certain?

chr1swallace commented 1 year ago

well no, that shouldn't happen.

The way coloc works, is that you would need Bayes factors for all snps in a region, not just those in credible sets, and certainly not just those in both credible sets. So it doesn't sound like you are using it right. But it still shouldn't produce two hypotheses with posteriors of 1. I can't reproduce this behaviour. Can I ask what the vectors of Bayes factors look like?


From: zainomarali @.> Sent: 08 November 2022 17:10 To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] res$summary shows PP.H2.abf=1 AND PP.H4.abf=1 (Issue #105)

Some background on what I am doing - due to the nature of our study a custom approach was taken to construct credible sets (i.e. without using SusieR). The approach we took gives us quite narrow credible sets and we have Bayes factors for all the variants in each credible set.

We are studying 9 different traits and we expect them to have some overlapping loci. We now have a long list of partially overlapping credible sets from different trait pairs. I wanted to use coloc to determine if the partially overlapping sets are truly colocalising or not.

We have Bayes factors for the variants in our credible set so I take the following approach:

  1. create vectors of Bayes factors for the 2 credible sets
  2. subset to only those variants that are shared between the credible sets

This gives 2 equal length vectors of Bayes factors for the 2 loci, but I have had to throw away non-intersecting variants (I'm not sure if this is a problem?).

Now it is trivial to simply apply coloc.bf_bf to these 2 vectors. In many cases this gives answers that make sense upon manual inspection.

But for a large number of cases the results summary shows that:

(PP.H2.abf=1 OR PP.H1.abf=1) AND PP.H4.abf=1.

How is this possible? How can 2 mutually exclusive hypotheses be completely certain?

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F105&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cac77ea2f5adc4b5dff9d08dac1ac3875%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638035242667013316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qU%2FgFpfEnf2joUxIQ0lhvo86AJj%2B4JXXLu6ndx9kYj4%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BBO2O5ZS4IINH2HZTWHKCQTANCNFSM6AAAAAAR2RSCSU&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Cac77ea2f5adc4b5dff9d08dac1ac3875%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638035242667013316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UNIlCE0ggx6VUQENT%2FQUHAM9eModWR4Pu3ooVr7cIKM%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

zainomarali commented 1 year ago

Yes I can give you one example

a_vec = [4.567277e+26, 3.942818e+25] b_vec = [7.097575e+45, 2.174417e+44]

if you run coloc.bf_bf on these vectors you should reproduce the behavior I described. Unfortunately I cannot share very complex examples with many markers or share my code directly as everything is on a protected research network.

Thank you for your clarification on how to use coloc! I will ask our statistician for the other Bayes Factors (I was only provided BF for variants in credible sets)

chr1swallace commented 1 year ago

these values are huge. I think the issue is that the docs for coloc.bf_bf say these should be Bayes factors, but in fact they should be log​ Bayes factors. Your example works if I use log(a), log(b). I think the solution is that I update the docs - would you agree?


From: zainomarali @.> Sent: 09 November 2022 10:47 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] res$summary shows PP.H2.abf=1 AND PP.H4.abf=1 (Issue #105)

Yes I can give you one example

a_vec = [4.567277e+26, 3.942818e+25] b_vec = [7.097575e+45, 2.174417e+44]

if you run coloc.bf_bf on these vectors you should reproduce the behavior I described. Unfortunately I cannot share very complex examples with many markers or share my code directly as everything is on a protected research network.

Thank you for your clarification on how to use coloc! I will ask our statistician for the other Bayes Factors (I was only provided BF for variants in credible sets)

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F105%23issuecomment-1308560694&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2264063e1399446cc0cd08dac23fd3f1%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638035876638467170%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bMk5s7mpgA7Lesgh3wChRiQXtub4G%2FFpfYGZK6gxLS0%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2FD7LJAJKJZK77KCG3WHN6KJANCNFSM6AAAAAAR2RSCSU&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2264063e1399446cc0cd08dac23fd3f1%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638035876638467170%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1%2BXBjSwTxxI1%2F7VmWDVeokdvhCeWF5OZZ5Ig0e9Lozg%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

zainomarali commented 1 year ago

Yes that sounds great! I will retry with log Bayes Factors and with the BFs for the entire region, thank you so much for your quick reply Chris!