chr1swallace / coloc

Repo for the R package coloc
139 stars 44 forks source link

Error when using coloc 5.2.0 version #96

Closed JFF1594032292 closed 1 year ago

JFF1594032292 commented 2 years ago

Hi, I have updated the coloc 5.2.0. However it seems like some problems happens: The runsusie() function failed when calculate my data which worked well in coloc 5.1.0.

This is coloc version 5.2.0 running max iterations: 100 Error in init_finalize(s) : Input residual variance sigma2 must be a scalar In addition: Warning messages: 1: In if (n <= 1) stop("n must be greater than 1") : the condition has length > 1 and only the first element will be used 2: In if (n <= 1) stop("n must be greater than 1") : the condition has length > 1 and only the first element will be used

And it seems to happened on almost all my previous data which worked well. (I haven't test them all)

Then the examples https://chr1swallace.github.io/coloc/articles/a06_SuSiE.html couldn't work either image

I wonder if their are some conflicts between new version susieR and coloc, or should I update my data structure? Thanks!

chr1swallace commented 2 years ago

Thank you for the report. It is strange, because the examples seem to work well for me. What version of susieR are you using?

On Fri, 2022-07-08 at 01:48 -0700, JFF wrote:

Hi, I have updated the coloc 5.2.0. However it seems like some problems happens: The runsusie() function failed when calculate my data which worked well in coloc 5.1.0.

This is coloc version 5.2.0 running max iterations: 100 Error in init_finalize(s) : Input residual variance sigma2 must be a scalar In addition: Warning messages: 1: In if (n <= 1) stop("n must be greater than 1") : the condition has length > 1 and only the first element will be used 2: In if (n <= 1) stop("n must be greater than 1") : the condition has length > 1 and only the first element will be used And it seems to happened on almost all my previous data which worked well. (I haven't test them all) Then the examples https://chr1swallace.github.io/coloc/articles/a06_SuSiE.html couldn't work either

I wonder if their are some conflicts between new version susieR and coloc, or should I update my data structure? Thanks! — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

JFF1594032292 commented 2 years ago

It seem like R automatically updated susieR to 0.12.16 when updated coloc, because I remembered the 0.11.8x susieR was too old too install coloc 5.2.0. I just rollback coloc to 5.1.0, and it works well with susieR 0.12.16. image

chr1swallace commented 2 years ago

actually, it doesn't work well - because coloc 5.1.0 will not pass the sample size parameter n to susie, as the warning highlights, whilst 5.2.0 will.

I just tried a fresh install of susieR 0.12.16 from CRAN and coloc 5.2.0 from github, and the examples run fine for me, so I am really at loss to understand what is going wrong for you. Can I double check the errors you see with coloc 5.2.0 occur in a fresh session, so no old code from an earlier coloc or susieR hanging around?


From: JFF @.> Sent: 08 July 2022 10:34 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Error when using coloc 5.2.0 version (Issue #96)

It seem like R automatically updated susieR to 0.12.16 when updated coloc, because I remembered the 0.11.8x susieR was too old too install coloc 5.2.0. I just rollback coloc to 5.1.0, and it works well with susieR 0.12.16. [image]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F49577864%2F177963345-89de1ebe-01ef-4db6-95af-b7e1796b3c6a.png&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C842052b5d42d4a14f95408da60c51fa4%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637928696990157687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=D1AMpxg4CbFXT5tiK5jTHC%2FgyxfpfwHTyBAbkLrY6ms%3D&reserved=0

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F96%23issuecomment-1178771444&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C842052b5d42d4a14f95408da60c51fa4%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637928696990157687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ak6suTqZHwh2lhjTEJhQYYFPF3IIFbdQiP3JKqo%2FoP4%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2CVI5M6ZPKS7OSOQ3TVS7Y2FANCNFSM53AHTPPA&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C842052b5d42d4a14f95408da60c51fa4%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637928696990157687%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=77HbWGARfwM%2BzHIjNYxbMT96kvuGs9zJSLDeW2lnj5M%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

JFF1594032292 commented 2 years ago

Hi, I built a new environment by conda, and the example data could work well with coloc 5.2.0 & susieR 0.12.16. But it still couldn't work on my data, all data reported the same error: image And at least part of these data were working fine under coloc 5.1.0, and also showed high PPH4 in the coloc.susie() process. I don't know why the same data couldn't work for new version, my data only lack of "position" which was not necessary. I couldn't search any useful information from this error message either. image And as 5.1.0 couldn't pass the sample size information, this new version should be necessary for colocalization? Thanks!

chr1swallace commented 1 year ago

what is str(D)? Is D$N a scalar as it should be, or a vector? Is it expected that your LD matrix is not symmetric?

JFF1594032292 commented 1 year ago

D$N was set as a vector in my data, and it worked when I changed it to a scalar. Thanks a lot! Another interesting thing is, I ran runsusie() on summary data with it's original genotype LD matrix, and it worked well on all regions. However, it reported this error in many loci when I run it on a same cohort LD matrix (which was the best matched genotype I can find). image It seems like runsusie() extremely sensitive to the consistent between summary data and LD matrix, and may cause >50% regions (even ~100% in some datasets) couldn't conduct this analysis. Because for most public GWAS data (especially the meta GWAS), we can only use other public datasets to obtain the LD matrix.

chr1swallace commented 1 year ago

yes, the sensitivity to the LD matrix is a big issue. Note that although I am not an author of susie or any of its competitors, I suspect this sensitivity is widespread, susie is just more likely to explicitly fail rather than silently. see Benner et al (PMID 28942963) for some more detail


From: JFF @.> Sent: 02 August 2022 08:49 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Comment @.> Subject: Re: [chr1swallace/coloc] Error when using coloc 5.2.0 version (Issue #96)

D$N was set as a vector in my data, and it worked when I changed it to a scalar. Thanks a lot! Another interesting thing is, I ran runsusie() on summary data with it's original genotype LD matrix, and it worked well on all regions. However, it reported this error in many loci when I run it on a same cohort LD matrix (which was the best matched genotype I can find). [image]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F49577864%2F182319669-3774bae8-cead-4cb4-b2de-58f75127d49a.png&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Ce6ee363bb2554f98754208da745b986f%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637950233995050735%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RQ08LU%2FZZs9OwfX4h9sMRSIrMr4q66iYzUpJrm5AkrM%3D&reserved=0 It seems like runsusie() extremely sensitive to the consistent between summary data and LD matrix, and may cause >50% regions (even ~100% in some datasets) couldn't conduct this analysis. Because for most public GWAS data (especially the meta GWAS), we can only use other public datasets to obtain the LD matrix.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F96%23issuecomment-1202139812&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Ce6ee363bb2554f98754208da745b986f%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637950233995050735%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=efTDkqNX%2B0l3r85gSij8XhJLEcOqKByEAhOYUZcErlM%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2HOWK3I35HC4Q7WN7LVXDHIJANCNFSM53AHTPPA&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7Ce6ee363bb2554f98754208da745b986f%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637950233995050735%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Uo9hjM4D9peFW%2BSAXsjDGSvxStwC7m7a5%2Bghs7ymuCQ%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

JFF1594032292 commented 1 year ago

Thanks, It's very helpful to me!

mocksu commented 8 months ago

what is str(D)? Is D$N a scalar as it should be, or a vector? Is it expected that your LD matrix is not symmetric?

I am working on the latest version of "coloc" (and thus susie I guess). The N varies from row to row for my data. What should I do with N?

Thanks!

chr1swallace commented 8 months ago

coloc software assumes all variants have the same sample coverage, and then compares Bayes factors across different variants. If there is substantially different sample coverage between two variants, then these Bayes factors are not comparable. Imagine two variants in complete LD, one typed in all samples and the other in half the samples. They are in complete LD, so should have equal Bayes factors, but in fact the variant with higher sample coverage is likely to have a larger Bayes factor.

How variable is your N?

-- https://chr1swallace.github.iohttps://chr1swallace.github.io/


From: mocksu @.> Sent: Friday, November 3, 2023 10:54 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] Error when using coloc 5.2.0 version (Issue #96)

what is str(D)? Is D$N a scalar as it should be, or a vector? Is it expected that your LD matrix is not symmetric?

I am working on the latest version of "coloc" (and thus susie I guess). The N varies from row to row for my data. What should I do with N?

Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/chr1swallace/coloc/issues/96#issuecomment-1793213779, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2A6C6G4SWAX3PVXI33YCVY3TAVCNFSM53AHTPPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGMZDCMZXG44Q. You are receiving this because you modified the open/close state.Message ID: @.***>

mocksu commented 8 months ago

coloc software assumes all variants have the same sample coverage, and then compares Bayes factors across different variants. If there is substantially different sample coverage between two variants, then these Bayes factors are not comparable. Imagine two variants in complete LD, one typed in all samples and the other in half the samples. They are in complete LD, so should have equal Bayes factors, but in fact the variant with higher sample coverage is likely to have a larger Bayes factor. How variable is your N? -- https://chr1swallace.github.iohttps://chr1swallace.github.io/ ____ From: mocksu @.> Sent: Friday, November 3, 2023 10:54 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] Error when using coloc 5.2.0 version (Issue #96) what is str(D)? Is D$N a scalar as it should be, or a vector? Is it expected that your LD matrix is not symmetric? I am working on the latest version of "coloc" (and thus susie I guess). The N varies from row to row for my data. What should I do with N? Thanks! — Reply to this email directly, view it on GitHub<#96 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAQWR2A6C6G4SWAX3PVXI33YCVY3TAVCNFSM53AHTPPKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZZGMZDCMZXG44Q. You are receiving this because you modified the open/close state.Message ID: @.***>

A couple of times difference (e.g. 1,000 ~ 3,000) of the sample size. I decided to use median of the sample size to get around it.