chr1swallace / coloc

Repo for the R package coloc
139 stars 44 forks source link

runsusie: taking too much iterations to reach convegence #90

Closed zillurbmb51 closed 1 year ago

zillurbmb51 commented 2 years ago

Hi, I am running runsusie for multiple loci: baso_susie=lapply(1:length(baso7), function(i) runsusie(baso7[[i]])) # Here baso7 is a list of lists and each list contains all the required variables for runsusie . It works fine in most cases but for some loci it is taking forever to reach convergence.

Is there any way that I could delete those loci from my datasets?

I do not know the differences between loci. Only thing I am sure that the locus which is taking too much time, has slightly higher number of SNPs. That particular locus has 2236 SNPs, but the same locus with 1947 SNPs in another trait reached convergence after few hundreds iterations.

I am lost and can not think what I need to do.

The objective of my analysis is to use runsusie for a list of loci in some trait then use coloc.susie for the same loci in pair of traits as described in here: https://chr1swallace.github.io/coloc/articles/a06_SuSiE.html

Now, if locus X reached convergence for trait1 but did not for trait2, what should I do? runsusie_taking_too_much_time

Attached is a screenshot of susie running.

chr1swallace commented 2 years ago

Sometimes susie can take a long time and not converge. This can happen when there are many true signals, when there are many snps or (most common in my experience) when there is a mismatch between the summary data and the LD matrix supplied - sometimes only a small mismatch. Myself, I do not run runsusie in an lapply because of this. When there is a failure to converge, I tend to fall back on using coloc under the single causal variant assumption.


From: zillurbmb51 @.> Sent: 02 June 2022 00:20 To: chr1swallace/coloc @.> Cc: Subscribed @.***> Subject: [chr1swallace/coloc] runsusie: taking too much iterations to reach convegence (Issue #90)

Hi, I am running runsusie for multiple loci: baso_susie=lapply(1:length(baso7), function(i) runsusie(baso7[[i]])) # Here baso7 is a list of lists and each list contains all the required variables for runsusie . It works fine in most cases but for some loci it is taking forever to reach convergence.

Is there any way that I could delete those loci from my datasets?

I do not know the differences between loci. Only thing I am sure that the locus which is taking too much time, has slightly higher number of SNPs. That particular locus has 2236 SNPs, but the same locus with 1947 SNPs in another trait reached convergence after few hundreds iterations.

I am lost and can not think what I need to do.

The objective of my analysis is to use runsusie for a list of loci in some trait then use coloc.susie for the same loci in pair of traits as described in here: https://chr1swallace.github.io/coloc/articles/a06_SuSiE.htmlhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchr1swallace.github.io%2Fcoloc%2Farticles%2Fa06_SuSiE.html&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C68672d8476d4461ce3a408da44255924%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637897224436009392%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v%2FLsCeDYJxlXkfPjkD%2FJ4Vr9rd9tV64gl8G4%2FIIum3w%3D&reserved=0

Now, if locus X reached convergence for trait1 but did not for trait2, what should I do? [runsusie_taking_too_much_time]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F12833907%2F171517094-277b7497-8a55-4414-b2f6-70c409b15320.png&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C68672d8476d4461ce3a408da44255924%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637897224436009392%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pLJQhfIBjanZVc%2BHSkgXifIX7ufZREtBmc%2FLor%2Bq6Vw%3D&reserved=0

Attached is a screenshot of susie running.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F90&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C68672d8476d4461ce3a408da44255924%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637897224436009392%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=G6mDlY8hiEnSR6%2F57ea85tw4lH06sbnSpZB7%2FNc%2FlYc%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2FSVUXSPGSTST26RKDVM7V2VANCNFSM5XTCXHPA&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C68672d8476d4461ce3a408da44255924%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637897224436009392%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jJYdvBLY5gtXfKdo5QkTlQYkTEcIh9%2Bt5q%2BQuV7fHMs%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

zillurbmb51 commented 2 years ago

@chr1swallace Thanks a lot.

Is there any way to skip a runsusie after a certain number of iteration and then start the next run?

For example, I have 100 loci, I want to run them in loop. If the 5th locus does not reach convergence in 10000 iterations, it would through a message and start 6th locus.

chr1swallace commented 2 years ago

you can control this through the two arguments: maxit

maximum number of iterations for the first run of susie_rss(). If susie_rss() does not report convergence, runs will be extended assuming repeat_until_convergence=TRUE. Most users will not need to change this default.

repeat_until_convergence

keep running until susie_rss() indicates convergence. Default TRUE. If FALSE, susie_rss() will run with maxit iterations, and if not converged, runsusie() will error. Most users will not need to change this default.

see https://chr1swallace.github.io/coloc/reference/runsusie.html for other arguments [https://chr1swallace.github.io/coloc/logo.png]https://chr1swallace.github.io/coloc/reference/runsusie.html Run susie on a single coloc-structured dataset — runsusiehttps://chr1swallace.github.io/coloc/reference/runsusie.html run susie_rss storing some additional information for coloc chr1swallace.github.io


From: zillurbmb51 @.> Sent: 06 June 2022 11:54 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Mention @.> Subject: Re: [chr1swallace/coloc] runsusie: taking too much iterations to reach convegence (Issue #90)

@chr1swallacehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C83fe00cc62374701032508da47aafc78%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637901096967104152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dA7UcRQlB%2FX7VOOm%2B2Ip4vzLEtXhsvlfwhLi8WopSpU%3D&reserved=0 Thanks a lot.

Is there any way to skip a runsusie after a certain number of iteration and then start the next run?

For example, I have 100 loci, I want to run them in loop. If the 5th locus does not reach convergence in 10000 iterations, it would through a message and start 6th locus.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F90%23issuecomment-1147318963&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C83fe00cc62374701032508da47aafc78%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637901096967104152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hlCr0q%2B6i3aNscE7vzIP95ld4RWP1PNHJmm0tgqAs88%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2ELSZVBB7FHY2PEI3TVNXKF3ANCNFSM5XTCXHPA&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C83fe00cc62374701032508da47aafc78%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637901096967104152%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LPotI8ZXiJfisQsvzyqV9xE%2B3FNRa1cs1ldn05d33HI%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

zillurbmb51 commented 2 years ago

Thank you. I have tried these options maxit=10000, repeat_until_convergence=F . With these options, if a locus fails to reach convergence, runsusie does not start the next locus, it throw an error and exit. Below is the error log.

How could I use runsusie for multiple loci in a way that if convergence fails in one locus it gives the message and starts to run the next locus?

scz_baso_susie=lapply(1:length(scz_baso7), function(i) runsusie(scz_baso7[[i]],check_prior=F,maxit = 10000,repeat_until_convergence = F))
running max iterations: 10000
    converged: TRUE
running max iterations: 10000
    converged: TRUE
running max iterations: 10000
    converged: FALSE
Error in runsusie(scz_baso7[[i]], check_prior = F, maxit = 10000, repeat_until_convergence = F) : 
  susie_rss() did not converge in 10000 iterations. Try running with run_until_convergence=TRUE
In addition: Warning messages:
1: In check_dataset(d, suffix, req = c("beta", "varbeta", "LD", "snp")) :
  minimum p value is: 0.00019787
If this is what you expected, this is not a problem.
If this is not as small as you expected, please check the 02_data vignette.
2: In check_dataset(d, suffix, req = c("beta", "varbeta", "LD", "snp")) :
  minimum p value is: 7.6246e-05
If this is what you expected, this is not a problem.
If this is not as small as you expected, please check the 02_data vignette.
3: In susie_suff_stat(XtX = R, Xty = z, n = 2, yty = 1, scaled_prior_variance = prior_variance,  :

 Error in runsusie(scz_baso7[[i]], check_prior = F, maxit = 10000, repeat_until_convergence = F) : 
susie_rss() did not converge in 10000 iterations. Try running with run_until_convergence=TRUE
chr1swallace commented 2 years ago

you will need to use an R mechanism for catching errors, such as tryCatch(). I do believe that failure to converge needs to be an error, because otherwise people will base inference on the non-converged runs.


From: zillurbmb51 @.> Sent: 06 June 2022 14:09 To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; Mention @.> Subject: Re: [chr1swallace/coloc] runsusie: taking too much iterations to reach convegence (Issue #90)

Thank you. I have tried these options maxit=10000, repeat_until_convergence=F . With these options, if a locus fails to reach convergence, runsusie does not start the next locus, it throw an error and exit. Below is the error log.

How could I use runsusie for multiple loci in a way that if convergence fails in one locus it gives the message and starts to run the next locus?

scz_baso_susie=lapply(1:length(scz_baso7), function(i) runsusie(scz_baso7[[i]],check_prior=F,maxit = 10000,repeat_until_convergence = F)) running max iterations: 10000 converged: TRUE running max iterations: 10000 converged: TRUE running max iterations: 10000 converged: FALSE Error in runsusie(scz_baso7[[i]], check_prior = F, maxit = 10000, repeat_until_convergence = F) : susie_rss() did not converge in 10000 iterations. Try running with run_until_convergence=TRUE In addition: Warning messages: 1: In check_dataset(d, suffix, req = c("beta", "varbeta", "LD", "snp")) : minimum p value is: 0.00019787 If this is what you expected, this is not a problem. If this is not as small as you expected, please check the 02_data vignette. 2: In check_dataset(d, suffix, req = c("beta", "varbeta", "LD", "snp")) : minimum p value is: 7.6246e-05 If this is what you expected, this is not a problem. If this is not as small as you expected, please check the 02_data vignette. 3: In susie_suff_stat(XtX = R, Xty = z, n = 2, yty = 1, scaled_prior_variance = prior_variance, :

Error in runsusie(scz_baso7[[i]], check_prior = F, maxit = 10000, repeat_until_convergence = F) : susie_rss() did not converge in 10000 iterations. Try running with run_until_convergence=TRUE

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F90%23issuecomment-1147429673&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C570cbe1b9ad14db54ae408da47bdf071%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637901178345646369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YSCFoG6pDhqWNHztk9YNjj5pIVTMaCvmgIgvNUu4bGA%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2GS4UBVLKGMN47NMHDVNX2CPANCNFSM5XTCXHPA&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C570cbe1b9ad14db54ae408da47bdf071%7C49a50445bdfa4b79ade3547b4f3986e9%7C0%7C0%7C637901178345646369%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5vvzC%2B3hVqOoodHBiGuvh6CJRGxL%2BdeErDQtLeibU9g%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jdblischak commented 2 years ago

Another idea: you could try to use the diagnostic functions provided by susieR

For example, you could assess the mismatch between your summary statistics and LD matrix using estimate_s_rss. If you run it on loci you know converge and those you know don't converge, you could determine an empirical value of lambda to filter loci that are too mismatched prior to fine-mapping

cwnag-c commented 1 year ago

Another idea: you could try to use the diagnostic functions provided by susieR

For example, you could assess the mismatch between your summary statistics and LD matrix using estimate_s_rss. If you run it on loci you know converge and those you know don't converge, you could determine an empirical value of lambda to filter loci that are too mismatched prior to fine-mapping

Would it be okay if I determine the empirical value of lambda to be 0.05?

jdblischak commented 1 year ago

Would it be okay if I determine the empirical value of lambda to be 0.05?

@cwnag-c I think the susieR authors would be best able to answer that question: https://github.com/stephenslab/susieR/issues

cwnag-c commented 1 year ago

OK!Thank you!